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Abstract Over the past decade, a large number of jet sub¬ 
structure observables have been proposed in the literature, 
and explored at the LHC experiments. Such observables at¬ 
tempt to utilize the internal structure of jets in order to dis¬ 
tinguish those initiated by quarks, gluons, or by boosted 
heavy objects, such as top quarks and W bosons. This re¬ 
port, originating from and motivated by the BOOST2013 
workshop, presents original particle-level studies that aim to 
improve our understanding of the relationships between jet 
substructure observables, their complementarity, and their 
dependence on the underlying jet properties, particularly the 
jet radius and jet transverse momentum. This is explored in 
the context of quark/gluon discrimination, boosted W boson 
tagging and boosted top quark tagging. 

Keywords boosted objects • jet substructure ■ beyond- 
the-Standard-Model physics searches ■ Large Hadron 
Collider 


1 Introduction 

The center-of-mass energies at the Large Hadron Collider 
are large compared to the heaviest of known particles, even 
after accounting for parton density functions. With the start 
of the second phase of operation in 2015, the center-of-mass 
energy will further increase from 7 TeV in 2010-2011 and 
8 TeV in 2012 to 13 TeV. Thus, even the heaviest states 
in the Standard Model (and potentially previously unknown 
particles) will often be produced at the LHC with substan¬ 
tial boosts, leading to a collimation of the decay products. 
For fully hadronic decays, these heavy particles will not be 
reconstructed as several jets in the detector, but rather as 
a single hadronic jet with distinctive internal substructure. 
This realization has led to a new era of sophistication in our 
understanding of both standard Quantum Chromodynamics 
(QCD) jets, as well as jets containing the decay of a heavy 
particle, with an array of new jet observables and detec¬ 
tion techniques introduced and studied to distinguish the two 
types of jets. To allow the efficient propagation of results 
from these studies of jet substructure, a series of BOOST 
Workshops have been held on an annual basis: SLAC 
(2009) [1], Oxford University (2010) [2], Princeton Univer¬ 
sity (2011) [3], IFIC Valencia (2012) [4], University of Ari¬ 
zona (2013) [5], and, most recently. University College Lon¬ 
don (2014) [6]. Following each of these meetings, working 
groups have generated reports highlighting the most inter¬ 
esting new results, and often including original particle-level 
studies. Previous BOOST reports can be found at [7-9]. 

This report from BOOST 2013 thus views the study and 
implementation of jet substructure techniques as a fairly ma¬ 
ture field, and focuses on the question of the correlations 
between the plethora of observables that have been devel¬ 
oped and employed, and their dependence on the underlying 


jet parameters, especially the jet radius R and jet transverse 
momentum ( pj ). In new analyses developed for the report, 
we investigate the separation of a quark signal from a gluon 
background ( q/g tagging), a W signal from a gluon back¬ 
ground ( W -tagging) and a top signal from a mixed quark/gluon 
QCD background (top-tagging). In the case of top-tagging, 
we also investigate the performance of dedicated top-tagging 
algorithms, the HepTopTagger [10] and the Johns Hopkins 
Tagger [11], We study the degree to which the discrimina¬ 
tory information provided by the observables and taggers 
overlaps by examining the extent to which the signal-background 
separation performance increases when two or more vari¬ 
ables/taggers are combined in a multivariate analysis. Where 
possible, we provide a discussion of the physics behind the 
structure of the correlations and the pj and R scaling that 
we observe. 

We present the performance of observables in idealized 
simulations without pile-up and detector resolution effects; 
the relationship between substructure observables, their cor¬ 
relations, and how these depend on the jet radius R and jet 
Pt should not be too sensitive to such effects. Conducting 
studies using idealized simulations allows us to more clearly 
elucidate the underlying physics behind the observed perfor¬ 
mance, and also provides benchmarks for the development 
of techniques to mitigate pile-up and detector effects. A full 
study of the performance of pile-up and detector mitigation 
strategies is beyond the scope of the current report, and will 
be the focus of upcoming studies. 

The report is organized as follows: in Sections 2-4, we 
describe the methods used in carrying out our analysis, with 
a description of the Monte Carlo event sample generation in 
Section 2, the jet algorithms, observables and taggers inves¬ 
tigated in our report in Section 3, and an overview of the 
multivariate techniques used to combine multiple observ¬ 
ables into single discriminants in Section 4. Our results fol¬ 
low in Sections 5-7, with q/g- tagging studies in Section 5, 

W -tagging studies in Section 6, and top-tagging studies in 
Section 7. Finally we offer some summary of the studies 
and general conclusions in Section 8. 

The principal organizers of and contributors to the anal¬ 
yses presented in this report are: B. Cooper, S. D. Ellis, 

M. Freytsis, A. Homig, A. Larkoski, D. Lopez Mateos, B. Shuve, 
and N. V. Tran. 

2 Monte Carlo Samples 

Below, we describe the Monte Carlo samples used in the q/g 
tagging, W -tagging, and top-tagging sections of this report. 
Note that no pile-up (additional proton-proton interactions 
beyond the hard scatter) are included in any samples, and 
there is no attempt to emulate the degradation in angular 
and pr resolution that would result when reconstructing the 
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jets inside a real detector; such effects are deferred to future 
study. 


2.1 Quark/gluon and W-tagging 

Samples were generated at y/s = 8 TeV for QCD dijets, and 
for W + W~ pairs produced in the decay of a scalar reso¬ 
nance. The W bosons are decayed hadronically. The QCD 
events were split into subsamples of gg and qq events, allow¬ 
ing for tests of discrimination of hadronic W bosons, quarks, 
and gluons. 

Individual gg and qq samples were produced at leading 
order (LO) using MadGraph5 [12], while W + W~ sam¬ 
ples were generated using the JHU Generator [13-15]. 
Both were generated using CTEQ6L1 PDFs [16]. The sam¬ 
ples were produced in exclusive pj bins of width 100 GeV, 
with the slicing parameter chosen to be the py of any final 
state parton or W at LO. At the parton level, the py bins in¬ 
vestigated in this report were 300-400 GeV, 500-600 GeV 
and 1.0-1.1 TeV. The samples were then showered through 
PYTHIA8 (version 8.176) [17] using the default tune 4C [18]. 
For each of the various samples (W, q. g) and py bins, 500k 
events were simulated. 


2.2 Top-tagging 

Samples were generated at y/s = 14 TeV. Standard Model 
dijet and top pair samples were produced with Sherpa 2.0.0 
[19-24], with matrix elements of up to two extra partons 
matched to the shower. The top samples included only hadronic 
decays and were generated in exclusive py bins of width 
100 GeV, taking as slicing parameter the top quark py . The 
QCD samples were generated with a lower cut on the lead¬ 
ing parton-level jet py, where parton-level jets are clustered 
with the anti-C/- algorithm and jet radii of R = 0.4, 0.8, 1.2. 
The matching scale is selected to be Q cut = 40, 60, 80 GeV 
for the /^min = 600,1000, and 1500 GeV bins, respectively. 
For the top samples, 100k events were generated in each bin, 
while 200k QCD events were generated in each bin. 


3 Jet Algorithms and Substructure Observables 

In Sections 3.1, 3.2, 3.3 and 3.4, we describe the various jet 
algorithms, groomers, taggers and other substructure vari¬ 
ables used in these studies. Over the course of our study, 
we considered a larger set of observables, but for presenta¬ 
tion purposes we included only a subset in the final analysis, 
eliminating redundant observables. 

We organize the algorithms into four categories: clus¬ 
tering algorithms, grooming algorithms, tagging algorithms. 


and other substructure variables that incorporate informa¬ 
tion about the shape of radiation inside the jet. We note that 
this labelling is somewhat ambiguous: for example, some of 
the “grooming” algorithms (such as trimming and pruning) 
as well as V-subjettiness can be used in a “tagging” capac¬ 
ity. This ambiguity is particularly pronounced in multivari¬ 
ate analyses, such as the ones we present here, since a single 
variable can act in different roles depending on which other 
variables it is combined with. Therefore, the following clas¬ 
sification is intended only to give an approximate organiza¬ 
tion of the variables, rather than as a definitive taxonomy. 

Before describing the observables used in our analysis, 
we give our definition of jet constituents. As a starting point, 
we can think of the final state of an LHC collision event 
as being described by a list of “final state particles”. In the 
analyses of the simulated events described below (with no 
detector simulation), these particles include the sufficiently 
long lived protons, neutrons, photons, pions, electrons and 
muons with no requirements on pt or rapidity. Neutrinos 
are excluded from the jet analyses. 

3.1 Jet Clustering Algorithms 

Jet clustering: Jets were clustered using sequential jet clus¬ 
tering algorithms [25] implemented in FastJet 3.0.3. Final 
state particles i, j are assigned a mutual distance dij and a 
distance to the beam, dm. The particle pair with smallest d tJ 
are recombined and the algorithm repeated until the smallest 
distance is from a particle i to the beam, dm, in which case 
i is set aside and labelled as a jet. The distance metrics are 
defined as 

A 

d^ = min {p%pl r j) (1) 

dm = pl], ( 2 ) 

where ARjj = (Ar\ij ) 2 + (A0,y) 2 , with Arpj being the sepa¬ 
ration in pseudorapidity of particles i and j, and A (j),j being 
the separation in azimuth. In this analysis, we use the anti-C/ 
algorithm ( 7 = —1) [26], the Cambridge/Aachen (C/A) algo¬ 
rithm (7=0) [27, 28], and the ky algorithm ( 7 = 1) [29, 30], 
each of which has varying sensitivity to soft radiation in the 
definition of the jet. 

This process of jet clustering serves to identify jets as 
(non-overlapping) sub-lists of final state particles within the 
original event-wide list. The particles on the sub-list corre¬ 
sponding to a specific jet are labeled the “constituents” of 
that jet, and most of the tools described here process this 
sub-list of jet constituents in some specific fashion to deter¬ 
mine some property of that jet. The concept of constituents 
of a jet can be generalized to a more detector-centric version 
where the constituents are, for example, tracks and calorime¬ 
ter cells, or to a perturbative QCD version where the con¬ 
stituents are partons (quarks and gluons). These different de- 
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scriptions are not identical, but are closely related. We will 
focus on the MC based analysis of simulated events, while 
drawing insight from the perturbative QCD view. Note also 
that, when a detector (with a magnetic field) is included in 
the analysis, there will generally be a minimum p-\ require¬ 
ment on the constituents so that realistic numbers of con¬ 
stituents will be smaller than, but presumably still propor¬ 
tional to, the numbers found in the analyses described here. 

Qjets: We also perform non-deterministic jet clustering [31, 
32]. Instead of always clustering the particle pair with small¬ 
est distance dij, the pair selected for combination is chosen 
probabilistically according to a measure 

Pjj OC £1 AmrOMlIIII (jS) 

where c/ m ; n is the minimum distance for the usual jet clus¬ 
tering algorithm at a particular step. This leads to a differ¬ 
ent cluster sequence for the jet each time the Qjet algorithm 
is used, and consequently different substructure properties. 
The parameter a is called the rigidity and is used to control 
how sharply peaked the probability distribution is around the 
usual, deterministic value. The Qjets method uses statistical 
analysis of the resulting distributions to extract more infor¬ 
mation from the jet than can be found in the usual cluster 
sequence. 


3.2 Jet Grooming Algorithms 

Pruning: Given a jet, re-cluster the constituents using the 
C/A algorithm. At each step, proceed with the merger as 
usual unless both 

min 


in which case the merger is vetoed and the softer branch dis¬ 
carded. The default parameters used for pruning [33] in this 
report are z cut = 0.1 and R cut = 0.5, unless otherwise stated. 
One advantage of pruning is that the thresholds used to veto 
soft, wide-angle radiation scale with the jet kinematics, and 
so the algorithm is expected to perform comparably over a 
wide range of momenta. 


( PTiiPTi) 2 ntj 

< z C ut and ARij > —^ cut , 
PTij PTj 


(4) 


Trimming: Given a jet, re-cluster the constituents into sub¬ 
jets of radius R t rim with the kj algorithm. Discard all subjets 
i with 

Pit < /cut P'f J• (5) 

The default parameters used for trimming [34] in this report 
are Rtnm = 0.2 and / cut = 0.03, unless otherwise stated. 


Filtering: Given a jet, re-cluster the constituents into sub¬ 
jets of radius with the C/A algorithm. Re-define the jet 


to consist of only the hardest N subjets, where N is deter¬ 
mined by the final state topology and is typically one more 
than the number of hard prongs in the resonance decay (to 
include the leading final-state gluon emission) [35]. While 
we do not independently use filtering, it is an important step 
of the HEPTopTagger to be defined later. 


Soft drop: Given a jet, re-cluster all of the constituents using 
the C/A algorithm. Iteratively undo the last stage of the C/A 
clustering from j into subjets j \, jo- If 


min(pn,p7- 2 ) f AR n \ 

Pti+Pti cut V R J 


( 6 ) 


discard the softer subjet and repeat. Otherwise, take j to be 
the final soft-drop jet [36]. Soft drop has two input param¬ 
eters, the angular exponent /I and the soft-drop scale z C ut- 
In these studies we use the default z cut = 0.1 setting, with 
0 = 2 . 


3.3 Jet Tagging Algorithms 

Modified Mass Drop Tagger: Given a jet, re-cluster all of 
the constituents using the C/A algorithm. Iteratively undo 
the last stage of the C/A clustering from j into subjets j\, /? 
with m.j l > tn.j 2 . If either 

min(PruPr?) . ? 

m jl > iirrij or - 2 -— AR 2 n <y C ut, (7) 

m j 

then dis card the b ranch with the smaller transverse mass 
mj = \J mj + p 2 ri , and re-define j as the branch with the 
larger transverse mass. Otherwise, the jet is tagged. If de¬ 
clustering continues until only one branch remains, the jet 
is considered to have failed the tagging criteria [37]. In this 
study we use by default p = 1.0 (i.e. implement no mass 
drop criteria) and y cut = 0.1. With respect to the singular 
parts of the splitting functions, this describes the same algo¬ 
rithm as running soft drop with [3=0. 

Johns Hopkins Tagger: Re-cluster the jet using the C/A al¬ 
gorithm. The jet is iteratively de-clustered, and at each step 
the softer prong is discarded if its p \ is less than 8 p /?Tjet- 
This continues until both prongs are harder than the pj thresh¬ 
old, both prongs are softer than the pj threshold, or if they 
are too close (|z\rj, 2 | + \A(j)jj\ < Sr); the jet is rejected if ei¬ 
ther of the latter conditions apply. If both are harder than the 
pi threshold, the same procedure is applied to each: this re¬ 
sults in 2, 3, or 4 subjets. If there exist 3 or 4 subjets, then 
the jet is accepted: the top candidate is the sum of the sub¬ 
jets, and W candidate is the pair of subjets closest to the W 
mass [11], The output of the tagger is the mass of the top 
candidate (m t ), the mass of the W candidate (mw), and 0^, 
a helicity angle defined as the angle, measured in the rest 
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frame of the W candidate, between the top direction and one 
of the W decay products. The two free input parameters of 
the John Hopkins tagger in this study are 8 p and 8r, defined 
above, and their values are optimized for different jet kine¬ 
matics and parameters in Section 7. 


see between variable performance at large pj/R will be ac¬ 
centuated in a high pile-up environment, necessitating a ded¬ 
icated study of pile-up to recover as much as possible the 
“ideal” performance seen here. Such a study is beyond the 
scope of this paper. 


HEPTopTagger: Re-cluster the jet using the C/A algorithm. 
The jet is iteratively de-clustered, and at each step the softer 
prong is discarded if mi/m .12 > /i (there is not a significant 
mass drop). Otherwise, both prongs are kept. This continues 
until a prong has a mass nij < m, at which point it is added to 
the list of subjets. Filter the jet using Rfn t = min (0.3, 
keeping the five hardest subjets (where AR tJ is the distance 
between the two hardest subjets). Select the three subjets 
whose invariant mass is closest to m t [10]. The top candi¬ 
date is rejected if there are fewer than three subjets or if 
the top candidate mass exceeds 500 GeV. The output of the 
tagger is m t , niw, and ©h (as defined in the Johns Hopkins 
Tagger). The two free input parameters of the HEPTopTag¬ 
ger in this study are m and /I, defined above, and their values 
are optimized for different jet kinematics and parameters in 
Section 7. 

Top-tagging with Pruning or Trimming: In the studies 
presented in Section 7 we add a W reconstruction step to the 
pruning and trimming algorithms, to enable a fairer com¬ 
parison with the dedicated top tagging algorithms described 
above. Following the method of the BOOST 2011 report [8], 
a W candidate is found as follows: if there are two subjets, 
the highest-mass subjet is the W candidate (because the W 
prongs end up clustered in the same subjet), and the W can¬ 
didate mass, m\v, the mass of this subjet; if there are three 
subjets, the two subjets with the smallest invariant mass com¬ 
prise the W candidate, and in\y is the invariant mass of this 
subjet pair. In the case of only one subjet, the top candidate 
is rejected. The top mass, m t , is the full mass of the groomed 
jet. 


3.4 Other Jet Substructure Observables 

The jet substructure observables defined in this section are 
calculated using jet constituents prior to any grooming. This 
approach has been used in several analyses in the past, for 
example [38, 39], whilst others have used the approach of 
only considering the jet constituents that survive the groom¬ 
ing procedure [40]. We take the first approach throughout 
our analyses, as this approach allows a study of both the hard 
and soft radiation characteristic of signal vs. background. 
However, we do include the effects of initial state radiation 
and the underlying event, and unsurprisingly these can have 
a non-negligible effect on variable performance, particularly 
at large pj and jet R. This suggests that the differences we 


Qjet mass volatility: As described above, Qjet algorithms 
re-cluster the same jet non-deterministically to obtain a col¬ 
lection of interpretations of the jet. For each jet interpreta¬ 
tion, the pruned jet mass is computed with the default prun¬ 
ing parameters. The mass volatility, iQjet, is defined as [31] 


jQjet — 


\J( m j) - (mj) 2 


(mj) 


( 8 ) 


where averages are computed over the Qjet interpretations. 
We use a rigidity parameter of a = 0.1 (although other stud¬ 
ies suggest a smaller value of a may be optimal [31, 32]), 
and 25 trees per event for all of the studies presented here. 


/V-suhjettiness: /V-subjettiness [41] quantifies how well the 
radiation in the jet is aligned along N directions. To compute 
A-subjettiness, one must first identify N axes within 
the jet. Then, 

^ E PTi min ( AR W ■■■> AR N /) > ( 9 ) 

where distances are between particles i in the jet and the 
axes, 

do = (10) 

i 


and R is the jet clustering radius. The exponent j3 is a free 
parameter. There is also some choice in how the axes used to 
compute (V-subjettiness are determined. The optimal config¬ 
uration of axes is the one that minimizes A'-subjettiness; re¬ 
cently, it was shown that the “winner-take-all” (WTA) axes 
can be easily computed and have superior performance com¬ 
pared to other minimization techniques [42], We use both 
the WTA (Section 7) and one-pass kj optimization axes (Sec¬ 
tions 5 and 6) in our studies. 

Often, a powerful discriminant is the ratio. 


hv, N -1 



( 11 ) 


While this is not an infrared-collinear (IRC) safe observable, 
it is calculable [43] and can be made IRC safe with a loose 
lower cut on Tv_ 1 . 
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Energy correlation functions: The transverse momentum 
version of the energy correlation functions are defined as 
[44]: 


ECF(A,j3) 


E 

n<i 2 <.<iNej 



(N -1 N \P 

n n AR ‘„k > 

^fc=lc=6+l ) 

( 12 ) 


where i is a particle inside the jet. It is preferable to work 
in terms of dimensionless quantities, particularly the energy 
correlation function double ratio: 

^ ECF(M + l,/3)ECF(A-l,/3) 

N ECF(A,/3) 2 ' ( J 

This observable measures higher-order radiation from leading- 
order substructure. Note that C 2 ~° is identical to the variable 
pjD introduced by CMS in [45]. 


observables capturing these differences are correlated, pro¬ 
viding some theoretical understanding of these variables and 
their performance. The motivation for these studies arises 
not only from the desire to “tag” a jet as originating from a 
quark or gluon, but also to improve our understanding of the 
quark and gluon components of the QCD backgrounds rel¬ 
ative to boosted resonances. While recent studies have sug¬ 
gested that quark/gluon tagging efficiencies depend highly 
on the Monte Carlo generator used [48, 49], we are more 
interested in understanding the scaling performance with pj 
and R , and the correlations between observables, which are 
expected to be treated consistently within a single shower 
scheme. 

Other examples of recent analytic studies of the corre¬ 
lations between jet observables relevant to quark jet versus 
gluon jet discrimination can be found in [43, 46, 50, 51]. 


4 Multivariate Analysis Techniques 

Multivariate techniques are used to combine multiple 
variables into a single discriminant in an optimal manner. 
The extent to which the discrimination power increases in a 
multivariable combination indicates to what extent the dis¬ 
criminatory information in the variables overlaps. There ex¬ 
ist alternative strategies for studying correlations in discrim¬ 
ination power, such as “truth matching” [46], but these are 
not explored here. 

In all cases, the multivariate technique used to combine 
variables is a Boosted Decision Tree (BDT) as implemented 
in the TMVA package [47]. An example of the BDT set¬ 
tings used in these studies, chosen to reduce the effect of 
overtraining, is given in [47]. The BDT implementation in¬ 
cluding gradient boost is used. Additionally, the simulated 
data were split into training and testing samples and com¬ 
parisons of the BDT output were compared to ensure that 
the BDT performance was not affected by overtraining. 

5 Quark-Gluon Discrimination 

In this section, we examine the differences between quark- 
and gluon-initiated jets in terms of substructure variables. At 
a fundamental level, the primary difference between quark- 
and gluon-initiated jets is the color charge of the initiating 
parton, typically expressed in terms of the ratio of the corre¬ 
sponding Casimir factors Cf/Ca= 4/9. Since the quark has 
the smaller color charge, it radiates less than a corresponding 
gluon and the naive expectation is that the resulting quark jet 
will contain fewer constituents than the corresponding gluon 
jet. The differing color structure of the two types of jet will 
also be realized in the detailed behavior of their radiation 
patterns. We determine the extent to which the substructure 


5.1 Methodology and Observable Classes 

These studies use the qq and gg MC samples described in 
Section 2. The showered events were clustered with Fast- 
Jet 3.03 using the anti Ay algorithm with jet radii of R 
0.4, 0.8, 1.2. In both signal (quark) and background (gluon) 
samples, an upper and lower cut on the leading jet p i is ap¬ 
plied after showering/clustering, to ensure similar pi spec¬ 
tra for signal and background in each pj bin. The bins in 
leading jet pj that are considered are 300-400 Ge V, 500-600 
GeV, 1.0-1.1 TeV, for the 300-400 GeV, 500-600 GeV, 1.0- 

1.1 TeV parton pj slices respectively. Various jet groom¬ 
ing approaches are applied to the jets, as described in Sec¬ 
tion 3.4. Only leading and subleading jets in each sample are 
used. The following observables are studied in this section: 

- Number of constituents (« C onstits) in the jet. 

- Pruned Qjet mass volatility, JQjet- 

- 1-point energy correlation functions, cf with /3 = 0, 1,2. 

- 1-subjettiness, rf with /3 = 1,2. The A-subjettiness axes 
are computed using one-pass k t axis optimization. 

- Ungroomed jet mass, m. 

For simplicity, we hereafter refer to quark-initiated jets (gluon- 
initiated jets) as quark jets (gluon jets). 

We will demonstrate that, in terms of their jet-by-jet cor¬ 
relations and their ability to separate quark jets from gluon 
jets, the above observables fall into five Classes. The first 
three observables, « CO nstits- iQjet and cf each constitutes 
a Class of its own (Classes I to III) in the sense that they 
each carry some independent information about a jet and, 
when combined, provide substantially better quark jet and 
gluon jet separation than any one observable alone. Of the 
remaining observables, C'f 1 and r(’ 1 comprise a single 
class (Class IV) because their distributions are similar for 
a sample of jets, their jet-by-jet values are highly correlated, 
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and they exhibit very similar power to separate quark jets 
and gluon jets (with very similar dependence on the jet pa¬ 
rameters R and /?■/); this separation power is not improved 
when they are combined. The fifth class (Class V) is com- 
posed of C| , t] and the (ungroomed) jet mass. Again 
the jet-by-jet correlations are strong (even though the indi¬ 
vidual observable distributions are somewhat different), the 
quark versus gluon separation power is very similar (includ¬ 
ing the R and pr dependence), and little is achieved by com¬ 
bining more than one of the Class V observables. This class 
structure is not surprising given that the observables within 
a class exhibit very similar dependence on the kinematics of 
the underlying jet constituents. For example, the members 
of Class V are constructed from of a sum over pairs of con¬ 
stituents using products of the energy of each member of the 
pair times the angular separation squared for the pair (this is 
apparent for the ungroomed mass when viewed in terms of a 
mass-squared with small angular separations). By the same 
argument, the Class IV and Class V observables will be seen 
to be more similar than any other pair of classes, differing 
only in the power (j3) of the dependence on the angular sep¬ 
arations, which produces small but detectable differences. 
We will return to a more complete discussion of jet masses 
in Section 5.4. 


5.2 Single Variable Discrimination 

In Figure 1 are shown the quark and gluon distributions of 
different substructure observables in the pj = 500 — 600 GeV 
bin for R = 0.8 jets. These distributions illustrate some of the 
distinctions between the Classes made above. The funda¬ 
mental difference between quarks and gluons, namely their 
color charge and consequent amount of radiation in the jet, 
is clearly indicated in Figure 1(a), suggesting that simply 
counting constituents provides good separation between quark 
and gluon jets. In fact, among the observables considered, 
one can see by eye that n C onstits should provide the highest 
separation power, i.e., the quark and gluon distributions are 
most distinct, as was originally noted in [49, 52]. Figure 1 
further suggests that cf _0 should provide the next best sep¬ 
aration, followed by C’( S 1 , as was also found by the CMS 
and ATLAS Collaborations [48, 53]. 

To more quantitatively study the power of each observ¬ 
able as a discriminator for quark/gluon tagging. Receiver 
Operating Characteristic (ROC) curves are built by scanning 
each distribution and plotting the background efficiency (to 
select gluon jets) vs. the signal efficiency (to select quark 
jets). Figure 2 shows these ROC curves for all of the sub¬ 
structure variables shown in Figure 1 for R = 0.4,0.8 and 

1.2 jets (in the pr = 300-400 GeV bin). In addition, the ROC 
curve for a tagger built from a BDT combination of all the 
variables (see Section 4) is shown. As suggested earlier, H CO nstits 


is the best performing variable for all R values, although 
C’( S ° is not far behind, particularly for R = 0.8. Most other 
variables have similar performance, with the main excep¬ 
tion of iQjet- which shows significantly worse discrimina¬ 
tion (this may be due to our choice of rigidity a = 0.1, 
with other studies suggesting that a smaller value, such as 
a = 0.01, produces better results [31, 32]). The combina¬ 
tion of all variables shows somewhat better discrimination 
than any individual observable, and we give a more detailed 
discussion in Section 5.3 of the correlations between the ob¬ 
servables and their impact on the combined discrimination 
power. 

We now examine how the performance of the substruc¬ 
ture observables varies with pi and R. To present the results 
in a “digestible” fashion we focus on the gluon jet “rejec¬ 
tion” factor, l/fibkg, for a quark signal efficiency, £ s i g , of 
50%. We can use the values of l/£bkg generated for the 9 
kinematic points introduced above (R = 0.4,0.8,1.2 and the 
100 GeV pr bins with lower limits pj = 300 GeV, 500 GeV, 
1000GeV) to generate surface plots. The surface plots in 
Figure 3 indicate both the level of gluon rejection and the 
variation with pr and R for each of the studied single ob¬ 
servable. The color shading in these plots is defined so that a 
value of 1 /fibkg — 1 yields the color “violet”, while 1 /fibkg — 
20 yields the color “red”. The “rainbow” of colors in be¬ 
tween vary linearly with log 10 (l/£bk g ). 

We organize our results by the classes introduced in the 
previous subsection: 

Class I: The sole constituent of this class is n C onstits- We see 
in Figure 3(a) that, as expected, the numerically largest re¬ 
jection rates occur for this observable, with the rejection fac¬ 
tor ranging from 6 to 11 and varying rather dramatically with 
R. As R increases the jet collects more constituents from the 
underlying event, which are the same for quark and gluon 
jets, and the separation power decreases. At large R, there is 
some improvement with increasing pr due to the enhanced 
QCD radiation, which is different for quarks vs. gluons. 

Class II: The variable i~Qj e t constitutes this class. Figure 3(b) 
confirms the limited efficacy of this single observable (at 
least for our parameter choices) with a rejection rate only 
in the range 2.5 to 2.8. On the other hand, this observable 
probes a very different property of jet substructure, i.e., the 
sensitivity to detailed changes in the grooming procedure, 
and this difference is suggested by the distinct R and pr de¬ 
pendence illustrated in Figure 3(b). The rejection rate in¬ 
creases with increasing R and decreasing pj, since the dis¬ 
tinction between quark and gluon jets for this observable 
arises from the relative importance of the one “hard” gluon 
emission configuration. The role of this contribution is en¬ 
hanced for both decreasing p j and increasing R. This gen¬ 
eral variation with pj and R is the opposite of what is exhib¬ 
ited in all of the other single variable plots in Figure 3. 



qg, pT = 500-600 GeV, AK8 BOOST13WG 



(a) ^constits 


qg, pT = 500-600 GeV, AK8 BOOST13WG 




(c) Cf- 0 



(d) Cf” 1 (e) Tf- 1 



(f) cf 2 (g) if 2 (h) Ungroomed mass 


Fig. 1 Comparisons of quark and gluon distributions of different substructure variables, organized by Class, for leading jets in the pj = 500 — 
600 GeV bin using the anti -kj R = 0.8 algorithm. The first three plots are Classes I-III, with Class IV in the second row, and Class V in the third 
row. 


Class III: The only member of this class is cf _0 . Figure 3(c) 
indicates that this observable can itself provide a rejection 
rate in the range 7.8 to 8.6 (intermediate between the two 
previous observables), and again with distinct R and /;-/ de¬ 
pendence. In this case the rejection rate decreases slowly 
with increasing R , which follows from the fact that /3 = 0 
implies no weighting of AR in the definition of cf ~°, greatly 
reducing the angular dependence. The rejection rate peaks at 


intermediate pi values, an effect visually enhanced by the 
limited number of p j values included. 

Class IV: Figures 3(d) and (e) confirm the very similar prop¬ 
erties of the observables cf~* and rf -1 (as already sug¬ 
gested in Figures 1(d) and (e)). They have essentially identi¬ 
cal rejection rates (4.1 to 5.4) and identical R and p-/ depen¬ 
dence (a slow decrease with increasing R and an even slower 
increase with increasing pr). 
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Fig. 2 The ROC curve for all single variables considered for quark-gluon discrimination in the pj 300-400 GeV bin using the antiAj- R = 0.4 
(top-left), 0.8 (top-right) and 1.2 (bottom) algorithm. 


Class V: The observables cf , rf 2 , and m have similar 
rejection rates in the range 3.5 to 5.3, as well as very similar 
R and pj dependence (a slow decrease with increasing R 
and an even slower increase with increasing pr). 

Arguably, drawing a distinction between the Class IV 
and Class V observables is a fine point, but the color shad¬ 
ing does suggest some distinction from the slightly smaller 
rejection rate in Class V. Again the strong similarities be¬ 
tween the plots within the second and third rows in Figure 3 
speaks to the common properties of the observables within 
the two classes. 

In summary, the overall discriminating power between 
quark and gluon jets tends to decrease with increasing R, ex¬ 
cept for the I"oj et observable, presumably in large part due to 
the contamination from the underlying event. Since the con¬ 
struction of the iQjet observable explicitly involves pruning 
away the soft, large angle constituents, it is not surprising 
that it exhibits different R dependence. In general the dis¬ 
criminating power increases slowly and monotonically with 
pr (except for the jQjet and C'f ° observables). This is pre¬ 
sumably due to the overall increase in radiation from high 
Pt objects, which accentuates the differences in the quark 
and gluon color charges and providing some increase in dis¬ 
crimination. In the following section, we study the effect of 
combining multiple observables. 


5.3 Combined Performance and Correlations 

Combining multiple observables in a BDT can give further 
improvement over cuts on a single variable. Since the im¬ 
provement from combining correlated observables is expected 
to be inferior to that from combining uncorrelated observ¬ 
ables, studying the performance of multivariable combina¬ 
tions gives insight into the correlations between substructure 
variables and the physical features allowing for quark/gluon 
discrimination. Based on our discussion of the correlated 
properties of observables within a single class, we expect 
little improvement in the rejection rate when combining ob¬ 
servables from the same class, and substantial improvement 
when combining observables from different classes. Our clas¬ 
sification of observables for quark/gluon tagging therefore 
motivates the study of particular combinations of variables 
for use in experimental analyses. 

To quantitatively study the improvement obtained from 
multivariate analyses, we build quark/gluon taggers from ev¬ 
ery pair-wise combination of variables studied in the pre¬ 
vious section; we also compare the pair-wise performance 
with the all-variables combination. To illustrate the results 
achieved in this way, we use the same 2D surface plots as 
in Figure 3. Figure 4 shows pair-wise plots for variables in 
(a) Class IV and (b) Class V, respectively. Comparing to 
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1(B)0 


(e) < 



100)0 


(f) Cf 




100)0 

(h) Ungroomed mass 


Fig. 3 Surface plots of 1 /£btg for all single variables considered for quark-gluon discrimination as functions of R and pj. The first three plots are 
Classes I-III, with Class IV in the second row, and Class V in the third row. 


the corresponding plots in Figure 3, we see that combin¬ 
ing cf -1 + rfprovides a small (~ 10%) improvement in 
the rejection rate with essentially no change in the R and pj 
dependence, while combining Cj " + Tj “ yields a rejec¬ 
tion rate that is essentially identical to the single observable 
rejection rate for all R and pj values (with a similar con¬ 
clusion if one of these observables is replaced with the un¬ 
groomed jet mass m). This confirms the expectation that the 
observables within a single class effectively probe the same 
jet properties. 

Next, we consider cross-class pairs of observables in Fig¬ 
ure 5, where, except in the one case noted below, we use 
only a single observable from each class for illustrative pur¬ 
poses. Since n CO nstits is the best performing single variable, 
the largest rejection rates are obtained from combining an¬ 
other observable with H CO nstits (Figures 5(a) to (e)). In gen¬ 
eral, the rejection rates are larger for the pair-wise case than 
for the single variable case. In particular, the pair n CO nstits + 


cf 1 in Figure 5(b) yields rejection rates in the range 6.4 
to 14.7 with the largest values at small R and large pj. As 
expected, the pair H CO nstits + tf -1 in Figure 5(e) yields very 
similar rejection rates (6.4 to 15.0), since cf~ l and rj 5-1 
are both in Class IV. The other pairings with n c(msllls yield 
smaller rejection rates and smaller dynamic ranges. The pair 
Hconstits + cf~° (Figure 5(d)) exhibits the smallest range of 
rates (8.3 to 11.3), suggesting that the differences between 
these two observables serve to substantially reduce the R and 
Pt dependence for the pair. The other pairs shown exhibit 
similar behavior. 

The R and p j dependence of the pair-wise combinations 
is generally similar to the single observable with the most 
dependence on R and pj. The smallest R and pi variation 
always occurs when pairing with cf _0 . Changing any of the 

observables in these pairs with a different observable in the 

6=2 6=2 

same class (e.g., C\ ~ for Tj ) produces very similar re¬ 
sults. 
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Fig. 4 Surface plots of 1 /£bkg for the indicated pairs of variables from (a) Class IV and (b) Class V considered for quark-gluon discrimination as 
functions of R and pj. 


Figure 5(1) shows the performance of a BDT combina¬ 
tion of all the current observables, with rejection rates in 
the range 10.5 to 17.1. The performance is very similar to 
that observed for the pair-wise n C onstits + cfand H C onstits + 
rj* 1 combinations, but with a somewhat narrower range 
and slightly larger maximum values. This suggests that al¬ 
most all of the available information to discriminate quark 
and gluon-initiated jets is captured by « con stits and cf ' or 
t[* 1 variables; this confirms the finding that near-optimal 
performance can be obtained with a pair of variables from 
[52], 

Some features are more easily seen with an alternative 
presentation of the data. In Figures 6 and 7 we fix R and 
pr and simultaneously show the single- and pair-wise ob¬ 
servables performance in a single matrix. The numbers in 
each cell are the same rejection rate for gluons used earlier, 
l/ £ bkg, with £ s i g = 50% (quarks). Figure 6 shows the results 
for pr = 1 — 1.1 TeV and R = 0.4,0.8,1.2, while Figure 7 
is for R = 0.4 and the 3 pr bins. The single observable re¬ 
jection rates appear on the diagonal, and the pairwise results 
are off the diagonal. The largest pair-wise rejection rate, as 
already suggested by Figure 5(e), appears at large pj and 
small R for the pair n con stits + T f 1 (with very similar re¬ 
sults for Hconstits + cf The correlations indicated by the 
shading 1 should be largely understood as indicating the or¬ 
ganization of the observables into the now-familiar classes. 
The all-observable (BDT) result appears as the number at 
the lower right in each plot. 

5.4 QCD Jet Masses 

To close the discussion of q/g- tagging, we provide some 
insight into the behavior of the masses of QCD jets ini¬ 
tiated by both kinds of partons, with and without groom¬ 
ing. Recall that, in practice, an identified jet is simply a list 
of constituents, i.e., final state particles. To the extent that 
the masses of these individual constituents can be neglected 

'The connection between the value of the rejection rate and the shading 
color in Figures 6 and 7 is the same as that in Figures 3 to 5. 


(due to the constituents being relativistic), each constituent 
has a “well- defined” 4-momentum from its energy and di¬ 
rection. It follows that the 4-momentum of the jet is simply 
the sum of the 4-momenta of the constituents and its square 
is the jet mass squared. Simply on dimensional grounds, 
we know that jet mass must have an overall linear scaling 
with pj, with the remaining pj dependence arising predom¬ 
inantly from the running of the coupling, a s (pj). The R de¬ 
pendence is also crudely linear as the jet mass scales ap¬ 
proximately with the largest angular opening between any 2 
constituents, which is set by R. 

To demonstrate this universal behavior for jet mass, we 
first note that if we consider the mass distributions for many 
kinematic points (various values of R and pj), we observe 
considerable variation in behaviour. This variation, however, 
can largely be removed by plotting versus the scaled variable 
m/pj/R. The mass distributions for quark and gluon jets 
versus m/pj/R for all of our kinematic points are shown 
in Figure 8, where we use a logarithmic scale on the y-axis 
to clearly exhibit the behavior of these distributions over a 
large dynamic range. We observe that the distributions for 
the different kinematic points do approximately scale as ex¬ 
pected, i.e., the simple arguments above capture most of the 
variation with R and pj. We will consider shortly an expla¬ 
nation of the residual non-scaling. A more rigorous quan¬ 
titative understanding of jet mass distributions requires all¬ 
orders calculations in QCD, which have been performed for 
groomed and ungroomed jet mass spectra at high logarith¬ 
mic accuracy, both in the context of direct QCD resumma- 
tion [37, 54-56] and Soft Collinear Effective Theory [57- 
59]. 

Several features of Figure 8 can be easily understood. 
The distributions all cut off rapidly for m/pj/R> 0.5, which 
is understood as the precise limit (maximum mass) for a 
jet composed of just 2 constituents. As expected from the 
soft and collinear singularities in QCD, the mass distribu¬ 
tion peaks at small mass values. The actual peak is “pushed” 
away from the origin by the so-called Sudakov form fac¬ 
tor. Summing the corresponding logarithmic structure (sin¬ 
gular in both pj and angle) to all orders in perturbation the- 
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Fig. 5 Surface plots of 1 /Ebkg for the indicated pairs of variables from different classes considered for quark-gluon discrimination as functions of 
R and pj. 
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Fig. 6 Gluon rejection defined as l/e g i UO n when using each 2-variable combination as a tagger with 50% acceptance for quark jets. Results are 
shown for jets with pr = 1—1.1 TeV and for (top left) R = 0.4; (top right) R = 0.8; (bottom) R = 1.2. The rejection obtained with a tagger that 
uses all variables is also shown in the plots. 


ory yields a distribution that is highly damped as the mass 
vanishes. In words, there is precisely zero probability that a 
color parton emits no radiation (and the resulting jet has zero 
mass). Above the Sudakov-suppressed part of phase space, 
there are two structures in the distribution: the “shoulder” 
and the “peak”. The large mass shoulder (0.3 < m/pr/R < 
0.5) is driven largely by the presence of a single large an¬ 
gle, energetic emission in the underlying QCD shower, i.e., 
this regime is quite well described by low-order perturba¬ 
tion theory 2 In contrast, we can think of the peak region as 
corresponding to multiple soft emissions. This simple, nec¬ 
essarily approximate picture provides an understanding of 
the bulk of the differences between the quark and gluon jet 
mass distributions. Since the probability of the single large 
angle, energetic emission is proportional to the color charge, 
the gluon distribution should be enhanced in this region by 
a factor of about Ca/Cf =9/4, consistent with what is ob¬ 

2 The shoulder label will become more clear when examining groomed 
jet mass distributions. 


served in Figure 8. Similarly the exponent in the Sudakov 
damping factor for the gluon jet mass distribution is en¬ 
hanced by the same factor, leading to a peak “pushed” fur¬ 
ther from the origin. Therefore, compared to a quark jet, the 
gluon jet mass distribution exhibits a larger average jet mass, 
with a larger relative contribution arising from the perturba¬ 
tive shoulder region and a small mass peak that is further 
from the origin. 

Together with the fact that the number of constituents 
in the jet is also larger (on average) for the gluon jet sim¬ 
ply because a gluon will radiate more than a quark, these 
features explain much of what we observed earlier in terms 
of the effectiveness of the various observables to separate 
quark jets from gluons jets. They also give us insight into 
the difference in the distributions for the observable i~Q) e t- 
Since the shoulder is dominated by a single large angle, 
hard emission, it is minimally impacted by pruning, which 
is designed to remove the large angle, soft constituents (as 
shown in more detail below). Thus, jets in the shoulder ex- 
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Fig. 7 Gluon rejection defined as l/e g i U on when using each 2-variable combination as a tagger with 50% acceptance for quark jets. Results are 
shown for R=0.4 jets with (top left) pr = 300 — 400 GeV, (top right) pj = 500 — 600 GeV and (bottom) pr = 1 — 1.1 TeV. The rejection obtained 
with a tagger that uses all variables is also shown in the plots. 




(a) Quark jets 


(b) Gluon jets 


Fig. 8 Comparisons of quark and gluon ungroomed mass distributions versus the scaled variable m/pj/R. 
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Fig. 9 Comparisons of quark and gluon pruned mass distributions versus the scaled variable m pr / Pt/R- 


hibit small volatility and they are a larger component in the 
gluon jet distribution. Hence gluon jets, on average, have 
smaller values of J~Qjet than quarkjets as in Figure 1(b). Fur¬ 
ther, this feature of gluon jets is distinct from the fact that 
there are more constituents, explaining why fQj et and n const its 
supply largely independent information for distinguishing 
quark and gluon jets. 

To illustrate some of these points in more detail. Fig¬ 
ure 9 exhibits the same jet mass distributions after prim¬ 
ing [33, 60], Removing the large angle, soft constituents 
moves the peak in both of the distributions from m/pr /R ~ 
0.1 —0.2 to the region around m/pr/R ~ 0.05. This explains 
why pruning works to reduce the QCD background when 
looking for a signal in a specific jet mass bin. The shoulder 
feature at higher mass is much more apparent after pruning, 
as is the larger shoulder for the gluon jets. A quantitative 
(all-orders) understanding of groomed mass distributions is 
also possible. For instance, resummation of the pruned mass 
distribution was achieved in [37, 56]. Figure 9 serves to con¬ 
firm the physical understanding of the relative behavior of 
fQjet for quark and gluon jets. 

Our final topic in this section is the residual R and pi 
dependence exhibited in Figures 8 and 9, which indicates 
a deviation from the naive linear scaling that has been re¬ 
moved by using the scaled variable m/pr/R . A helpful, in¬ 
tuitively simple, if admittedly imprecise, model of a jet is 
to separate the constituents of the jet into “hard” (with pj ’s 
that are of order the jet pf) versus “soft” (with pr’s small 
and fixed compared to the jet pr), and “large” angle (with 
an angular separation from the jet direction of order R) ver¬ 
sus “small” angle (with an angular separation from the jet 
direction smaller than and not scaling with R) components. 
As described above the Sudakov damping factor excludes 
constituents that are very soft or very small angle (or both). 
In this simple picture perturbative large angle, hard con¬ 
stituents appear rarely, but, as described above, they charac¬ 


terize the large mass jets that appear in the “shoulder” of the 
jet mass distribution where the mass scales approximately 
linearly with the jet pj and with R. The hard, small angle 
constituents are somewhat more numerous and contribute to 
a jet mass that does not scale with R. The soft constituents 
are much more numerous (becoming more numerous with 
increasing jet pj) and contribute to a jet mass that scales 
like y/prjet- The small angle, soft constituents contribute to 
a jet mass that does not scale with R, while the large angle, 
soft constituents do contribute to a jet mass that scales like R 
and grow in number approximately linearly in R (i.e., with 
the area of the annulus at the outer edge of the jet). This 
simple picture allows at least a qualitative explanation of the 
behavior observed in Figures 8 and 9. 

As already suggested, the residual pj dependence can 
be understood as arising primarily from the slow decrease 
of the strong coupling a s (pr) as pr increases. This leads to 
a corresponding decrease in the (largely perturbative) shoul¬ 
der regime for both distributions at higher pj, i.e., a decrease 
in the number of hard, large angle constituents. At the same 
time, and for the same reason, the Sudakov damping is less 
strong with increasing pj and the peak moves in towards 
the origin. While the number of soft constituents increases 
with increasing jet pj, their contributions to the scaled jet 
mass distribution shift to smaller values of m/pj (decreas¬ 
ing approximately like 1 /y/pr). Thus the overall impact of 
increasing pj for both distributions is a (gradual) shift to 
smaller values of m/pj/R. This is just what is observed in 
Figures 8 and 9, although the numerical size of the effect is 
reduced in the pruned case. 

The residual R dependence is somewhat more compli¬ 
cated. The perturbative large angle, hard constituent contri¬ 
bution largely scales in the variable m/pj/R, which is why 
we see little residual R dependence in either figure at higher 
masses ( m/pr/R > 0.4). The contribution of the small angle 
constituents (hard and soft) contribute at fixed m and thus 
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shift to the left versus the scaled variable as R increases. 
This presumably explains the small shifts in this direction 
at small mass observed in both figures. The large angle, soft 
constituents contribute to mass values that scale like R, and, 
as noted above, tend to increase in number as R increases 
( i.e ., as the area of the jet grows). Such contributions yield 
a scaled jet mass distribution that shifts to the right with in¬ 
creasing R and presumably explain the behavior at small pj 
in Figure 8. Since pruning largely removes this contribution, 
we observe no such behavior in Figure 9. 

5.5 Conclusions 

In Section 5 we have seen that a variety of jet observables 
provide information about the jet that can be employed to ef¬ 
fectively separate quark-initiated from gluon-initiated jets. 
Further, when used in combination, these observables can 
provide superior separation. Since the improvement depends 
on the correlation between observables, we use the multi- 
variable performance to separate the observables into dif¬ 
ferent classes, with each class containing highly correlated 
observables. We saw that the best performing single observ¬ 
able is simply the number of constituents in the jet, « C onstits> 
while the largest further improvement comes from combin¬ 
ing with cf _1 (or rf S '). The performance of this combined 
tagger is strongly dependent on pj and R, with the best 
performance being observed for smaller R and higher pj. 
The smallest R and /;-/ dependence arises from combining 
'Constits with Cf- 0 . Some of the commonly used observables 
for q/g tagging are highly correlated and do not provide 
extra information when used together. We have found that 
adding further variables to the « C onstits + cj 5-1 or « C onstits + 
t[* 1 BDT combination results in only a small improvement 
in performance, suggesting that almost all of the available 
information to discriminate quark and gluon-initiated jets is 
captured by « C onstits and cf _1 (or rf -1 ) variables. In addi¬ 
tion to demonstrating these correlations, we have provided 
a discussion of the physics behind the structure of the cor¬ 
relation. Using the jet mass as an example, we have given 
arguments to explicitly explain the differences between jet 
observables initiated by each type of parton. 

Finally, we remind the reader that the numerical results 
were derived for a particular color configuration (qq and gg 
events), in a particular implementation of the parton shower 
and hadronization. Color connections in more complex event 
configurations, or different Monte Carlo programs, may well 
exhibit somewhat different efficiencies and rejection factors. 
The value of our results is that they indicate a subset of vari¬ 
ables expected to be rich in information about the partonic 
origin of final-state jets. These variables can be expected to 
act as valuable discriminants in searches for new physics, 
and could also be used to define model-independent final- 


state measurements which would nevertheless be sensitive 
to the short-distance physics of quark and gluon production. 

6 Boosted W -Tagging 

In this section, we study the discrimination of a boosted, 
hadronically decaying W boson (signal) against a gluon- 
initiated jet background, comparing the performance of var¬ 
ious groomed jet masses and substructure variables. A range 
of different distance parameters for the anti-kj jet algorithm 
are explored, in a range of different leading jet pj bins. This 
allows us to determine the performance of observables as a 
function of jet radius and jet boost, and to see where dif¬ 
ferent approaches may break down. The groomed mass and 
substructure variables are then combined in a BDT as de¬ 
scribed in Section 4, and the performance of the resulting 
BDT discriminant explored through ROC curves to under¬ 
stand the degree to which variables are correlated, and how 
this changes with jet boost and jet radius. Using BDT com¬ 
binations of substructure variables to improve W tagging has 
been studied earlier in [61]. 

6.1 Methodology 

These studies use the WW samples as signal and the dijet 
gg as background, described previously in Section 2. Whilst 
only gluonic backgrounds are explored here, the conclusions 
regarding the dependence of the performance and correla¬ 
tions on the jet boost and radius are not expected to be sub¬ 
stantially different for quark backgrounds; we will see that 
the differences in the substructure properties of quark- and 
gluon-initiated jets, explored in the last section, are signifi¬ 
cantly smaller than the differences between W -initiated and 
gluon-initiated jets. 

As in the q/g tagging studies, the showered events were 
clustered with FastJet 3.03 using the anti -A:-/- algorithm 
with jet radii of R = 0.4, 0.8, 1.2. In both signal and back¬ 
ground samples, an upper and lower cut on the leading jet 
Pt is applied after showering/clustering, to ensure similar 
pr spectra for signal and background in each pj bin. The 
bins in leading jet pj that are considered are 300-400 GeV, 
500-600 GeV, 1.0-1.1 TeV, for the 300-400 GeV, 500-600 
GeV, 1.0-1.1 TeV parton pj slices respectively. The jets then 
have various grooming algorithms applied and substructure 
observables reconstructed as described in Section 3.4. The 
substructure observables studied in this section are: 

- Ungroomed, trimmed (m tl -j m ), and pruned (m prun ) jet masses. 

- Mass output from the modified mass drop tagger (w mm dt)- 

- Soft drop mass with /3 = 2 (m s <j). 

- 2-point energy correlation function ratio Cj ! (we also 
studied /3 = 2 but do not show its results because it showed 
poor discrimination power). 
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- IV-subjettiness ratio T 2 /T 1 with p = 1 ( 1 ) and with 
axes computed using one-pass k t axis optimization (we 
also studied ft = 2 but did not show its results because it 
showed poor discrimination power). 

- Pruned Qjet mass volatility, i~Qj e t- 


6.2 Single Variable Performance 

In this section we explore the performance of the various 
groomed jet mass and substructure variables in separating 
signal from background. Since we have not attempted to op¬ 
timise the grooming parameter settings of each grooming 
algorithm, we do not place much emphasis here on the rel¬ 
ative performance of the groomed masses, but instead con¬ 
centrate on how their performance changes depending on the 
kinematic bin and jet radius considered. 

Figure 10 compares the signal and background in terms 
of the different groomed masses explored for the anti-£ 7 - R = 
0.8 algorithm in the pi = 500-600 GeV bin. One can clearly 
see that, in terms of separating signal and background, the 
groomed masses are significantly more performant than the 
ungroomed anti-Aj’ R = 0.8 mass. Using the same jet radius 
and pi bin. Figure 11 compares signal and background for 
the different substructure variables studied. 

Figures 12, 13 and 14 show the single variable ROC 
curves for various pj bins and values of R. The single vari¬ 
able performance is also compared to the ROC curve for a 
BDT combination of all the variables (labelled “allvars”). 
In all cases, the “allvars” option is significantly more per¬ 
formant than any of the individual single variables consid¬ 
ered, indicating that there is considerable complementarity 
between the variables, and this is explored further in Sec¬ 
tion 6.3. 

In Figures 15, 16 and 17 the same information is shown 
in a format that more readily allows for a quantitative com¬ 
parison of performance for different R and p-/ ; matrices are 
presented which give the background rejection for a sig¬ 
nal efficiency of 70% 3 for single variable cuts, as well as 
two- and three-variable BDT combinations. The results are 
shown separately for each p-/ bin and jet radius considered. 
Most relevant for our immediate discussion, the diagonal 
entries of these plots show the background rejections for a 
single variable BDT using the labelled observable, and can 
thus be examined to get a quantitative measure of the indi¬ 
vidual single variable performance, and to study how this 
changes with jet radius and momenta. The off-diagonal en¬ 
tries give the performance when two variables (shown on 
the x-axis and on the y-axis, respectively) are combined in a 

3 Note that we here choose to report the rejection for a higher signal 
efficiency than the 50% that was used in the q/g tagging studies of 
Section 5, because the rejection rates in W tagging are considerably 
higher. 


BDT. The final column of these plots shows the background 
rejection performance for three-variable BDT combinations 
of m^ tl 2 + Cj 1 +X. These results will be discussed later in 
Section 6.3.3. 

In general, the most performant single variables are the 
groomed masses. However, in certain kinematic bins and 
for certain jet radii, cf 1 has a background rejection that 
is comparable to or better than the groomed masses. 

We first examine the variation of performance with jet 
Pt- By comparing Figures 15(a), 16(a) and 17(b), we can 
see how the background rejection performance varies with 
increased momenta whilst keeping the jet radius fixed to R = 

0.8. Similarly, by comparing Figures 15(b), 16(b) and 17(c) 
we can see how performance evolves with p-/ for R = 1.2. 

For both R = 0.8 and R= 1.2 the background rejection power 
of the groomed masses increases with increasing pj, with a 
factor 1.5-2.5 increase in rejection in going from the 300- 
400 GeV to 1.0-1.1 TeV bins. In Figure 18 we show the 
;n si i and m prun groomed masses for signal and background 
in the pj = 300-400 and pj = 1.0-1.1 TeV bins for R = 1.2 
jets. Two effects result in the improved performance of the 
groomed mass at high pj. Firstly, as is evident from the 
figure, the resolution of the signal peak after grooming im¬ 
proves, because the groomer finds it easier to pick out the 
hard signal component of the jet against the softer compo¬ 
nents of the underlying event when the signal is boosted. 
Secondly, it follows from Figure 9 and the discussion in Sec¬ 
tion 5.4 that, for increasing pj, the perturbative shoulder of 
the gluon distribution decreases in size, and thus there is a 
slight decrease (or at least no increase) of the background 
contamination in the signal mass region (m/py/R ~ 0.5). 

However, one can see from the Figures 15(b), 16(b) and 17(c) 
that the C^ 1 , i~Qj e t and substructure variables behave 
somewhat differently. The background rejection power of 
the i~Qj ct and 1 variables both decrease with increasing 
pr, by up to a factor two in going from the 300-400 GeV 
to 1.0-1.1 TeV bins. Conversely the rejection power of C? _1 
dramatically increases with increasing pi for R = 0.8, but 
does not improve with p j for the larger jet radius R = 1.2. 

In Figure 19 we show the and C? 1 distributions for 
signal and background in the /;-/ 300-400 GeV and pr = 
1.0-1.1 TeV bins forR = 0.8 jets. For t ^ -1 one can see that, 
in moving from lower to higher pj bins, the signal peak re¬ 
mains fairly unchanged, whereas the background peak shifts 
to smaller 1 values, reducing the discriminating power of 
the variable. This is expected, since jet substructure methods 
explicitly relying on the identification of hard prongs would 
expect to work best at low pj, where the prongs would tend 
to be more separated. However, C? 1 does not rely on the 
explicit identification of subjets, and one can see from Fig¬ 
ure 19 that the discrimination power visibly increases with 
increasing pj. This is in line with the observation in [44] 
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Wg, pT = 500-600 GeV, AK8 


BOOST13WG 


Wg, pT = 500-600 GeV, AK8 


BOOST13WG 


Wg, pT = 500-600 GeV, AK8 


BOOST13WG 




(a) Ungroomed mass 


(b) Pruned mass 


(c) Trimmed mass 


Wg, pT = 500-600 GeV, AK8 BOOST13WG 



Wg, pT = 500-600 GeV, AK8 BOOST13WG 



Fig. 10 Leading jet mass distributions in the gg background and WW signal samples in the pj = 500-600 GeV bin using the anti-Lr R = 0.8 
algorithm. 


that cf _l performs best when m/pr is small. The nega¬ 
tive correlation between the discrimination power of i~Qj et 
and increasing pi can be understood in similar terms. As 
discussed in Section 5.4, the low volatility component of a 
gluon jet, the “shoulder”, is enhanced as p i increases lead¬ 
ing to a background (QCD) volatility distribution more peaked 
at low values. In contrast the signal (W) jets will include 
more relatively soft radiation as pj increases leading to a 
more volatile configuration. Thus, as pj increases, the sig¬ 
nal jets will exhibit a somewhat broader volatility distribu¬ 
tion, while the background jets will exhibit a somewhat nar¬ 
rower volatility distribution, i.e., the distributions become 
more similar reducing the discriminating power of lQj e t- 
We now compare the performance of different jet radius 
parameters in the same pj bin by comparing the individual 
sub-figures of Figures 15, 16 and 17. To within ~ 25%, the 
background rejection power of the groomed masses remains 
constant with respect to the jet radius. Figure 20 shows how 
the groomed mass changes for varying jet radius in the p-/ 

= 1.0-1.1 TeV bin. One can see that the signal mass peak re¬ 
mains unaffected by the increased radius, as expected, since 
grooming removes the soft contamination which could oth¬ 
erwise increase the mass of the jet as the radius increased. 


The gluon background in the signal mass region also re¬ 
mains largely unaffected, as follows from Figure 9 and the 
discussion in Section 5.4, where it is shown that there is very 
little dependence of the groomed gluon mass distribution on 
R in the signal region (m/pr/R ~ 0.5). 

However, we again see rather different behaviour versus 
R for the substructure variables. In all pj bins considered, 
the most performant substructure variable, C? 1 , performs 
best for an anti-kr distance parameter of R = 0.8. The per¬ 
formance of this variable is dramatically worse for the larger 
jet radius of R = 1.2 (a factor seven worse background re¬ 
jection in the pj = 1.0-1.1 TeV bin), and substantially worse 
for R = 0.4. For the other jet substructure variables consid¬ 
ered, lQj et and Tjf 1 - their background rejection power also 
reduces for larger jet radius, but not to the same extent. Fig¬ 
ure 21 shows the zfj 1 and cf 1 distributions for signal and 
background in the pj = 1.0-1.1 TeV bin for R = 0.8 and 
R = 1.2 jet radii. For the larger jet radius, the Cj 1 distri¬ 
bution of both signal and background gets wider, and conse¬ 
quently the discrimination power decreases. For rf, 1 there 
is comparatively little change in the distributions with in¬ 
creasing jet radius. The increased sensitivity of C 2 to soft 
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Wg, pT = 500-600 GeV, AK8 BOOST13WG Wg, pT = 500-600 GeV, AK8 BOOST13WG 





(c) IQjet 



(d) 4“‘ 


Wg, pT = 500-600 GeV, AK8 BOOST13WG 



Fig. 11 Leading jet substructure variable distributions in the gg background and WW signal samples in the pj = 500-600 GeV bin using the anti-£j- 
R = 0.8 algorithm. 




Fig. 12 ROC curves for single variables considered for W tagging in the pj = 300-400 GeV bin using the anti -kr R = 0.8 algorithm and R = 1.2 
algorithm, along with a BDT combination of all variables (“allvars”). 
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(b) anti-lj- R = 1.2 , pr = 500-600 GeV bin 


Fig. 13 ROC curves for single variables considered for W tagging in the pj = 500-600 GeV bin using the anti-lj- R = 0.8 algorithm and R = 1.2 
algorithm, along with a BDT combination of all variables (“allvars”) 
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(c) anti-lj- R = 1.2, pr = 1.0-1.1 TeV bin 


Fig. 14 ROC curves for single variables considered for W tagging in the pj = 1.0-1.1 TeV bin using the anti-1'7- R = 0.4 algorithm, anti-lj- R = 0.8 
algorithm and R= 1.2 algorithm, along with a BDT combination of all variables (“allvars”) 
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(b) anti-fcj- R = 1.2, pj = 300-400 GeV bin 


Fig. 15 The background rejection for a fixed signal efficiency (70%) of each BDT combination of each pair of variables considered, in the 
Pt = 300-400 GeV bin using the anti-fc^ R = 0.8 algorithm and R = 1.2 algorithm. Also shown is the background rejection for three-variable 
combinations involving m^ -2 + cf _1 , and for a BDT combination of all of the variables considered. 
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Fig. 16 The background rejection for a fixed signal efficiency (70%) of each BDT combination of each pair of variables considered, in the 
Pt = 500-600 GeV bin using the anti-Aj- R = 0.8 algorithm and R = 1.2 algorithm. Also shown is the background rejection for three-variable 
combinations involving m^ -2 -(-Cj 1 , and for a BDT combination of all of the variables considered. 


wide angle radiation in comparison to T 21 is a known feature 
of this variable [44], and a useful feature in discriminating 
coloured versus colour singlet jets. However, at very large 
jet radii (R ~ 1.2), this feature becomes disadvantageous; 
the jet can pick up a significant amount of initial state or 
other uncorrelated radiation, and C 2 is more sensitive to this 
than is T 21 . This uncorrelated radiation has no (or very lit¬ 
tle) dependence on whether the jet is W- or gluon-initiated, 
and so sensitivity to this radiation means that the discrim¬ 
ination power will decrease. A similar description applies 
to the variable Tqjet. and the story is very similar to that 


for iQjet with increasing pj. At larger R the low volatility 
“shoulder” is enhanced in the QCD background jet, leading 
to a narrower volatility distribution. For the W jet, the larger 
R includes more uncorrelated radiation in the jet, leading to 
a broader volatility distribution. So, as with increasing pj, 
increasing R results in volatility distributions for signal and 
background jets that are more similar and FQj et exhibits re¬ 
duced discrimination power. 
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(c) anti-^r R = 1.2, pr = 1.0-1.1 TeV bin 

Fig. 17 The background rejection for a fixed signal efficiency (70%) of each BDT combination of each pair of variables considered, in the pr = 
1.0-1.1 TeV bin using the anti-kj- R = 0.4, R = 0.8 and R = 1.2 algorithm. Also shown is the background rejection for three-variable combinations 
involving m%7 2 + Cj 1 . and for a BDT combination of all of the variables considered. 


6.3 Combined Performance 

Studying the improvement in performance (or lack thereof) 
when combining single variables into a multivariate analy¬ 
sis gives insight into the correlations among jet observables. 
The off-diagonal entries in Figures 15,16 and 17 can be used 
to compare the performance of different BDT two-variable 
combinations, and see how this varies as a function of p-/ 
and R. By comparing the background rejection achieved for 
the two-variable combinations to the background rejection 
of the “all variables” BDT, one can also understand how dis¬ 
crimination can be improved by adding further variables to 
the two-variable BDTs. 

In general the most powerful two-variable combinations 
involve a groomed mass and a non-mass substructure vari¬ 


able (Cj 1 , fQj et or Tji *). Two-variable combinations of the 
substructure variables are not as powerful in comparison. 
Which particular mass + substructure variable combination 
is the most powerful depends strongly on the pj and R of 
the jet, as discussed in the sections to follow. 

There is also modest improvement in the background re¬ 
jection when different groomed masses are combined, in¬ 
dicating that there is complementary information between 
the different groomed masses (first shown in [62]). In ad¬ 
dition, there is an improvement in the background rejec¬ 
tion when the groomed masses are combined with the un¬ 
groomed mass, indicating that grooming removes some use¬ 
ful discriminatory information from the jet. These observa¬ 
tions are explored further in the section below. 
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(a) anti-Ar R = 1.2, pj = 300-400 GeV bin 

Wg, pT = 300-400 GeV, AK12 BOOST13WG 




(b) anti -kj R = 1.2, pj = 1.0-1.1 TeV bin 

Wg, pT = 1000-1100 GeV, AK12 



Fig. 18 The Soft-drop /3 = 2 and pruned groomed mass distribution for signal and background R = 1.2 jets in two different pj bins. 


Generally, the R = 0.8 jets offer the best two-variable 
combined performance in all pj bins explored here. This is 
despite the fact that in the highest pj = 1.0-1.1 TeV bin the 
average separation of the quarks from the W decay is much 
smaller than 0.8, and well within 0.4. This conclusion could 
of course be susceptible to pile-up, which is not considered 
in this study. It is in marked contrast to the R dependence 
of the q/g tagging performance shown in Section 5, where a 
monotonic improvement in performance with reducing R is 
observed. 

6.3.1 Mass + Substructure Performance 

As already noted, the largest background rejection at 70% 
signal efficiency are in general achieved using those two- 
variable BDT combinations which involve a groomed mass 


and a non-mass substructure variable. We now investigate 
the pj and R dependence of the performance of these com¬ 
binations. 

For both R = 0.8 and R = 1.2 jets, the rejection power 
of these two-variable combinations increases substantially 
with increasing pj, at least within the pj range considered 
here. 

For a jet radius of R = 0.8, across the full pj range con¬ 
sidered, the groomed mass + substructure variable combina¬ 
tions with the largest background rejection are those which 
involve ■ For example, in combination with w s( j, this 
produces a five-, eight- and fifteen-fold increase in back¬ 
ground rejection compared to using the groomed mass alone. 
In Figure 22 are shown 2-D histograms of m si j versus C’j ' 
for R = 0.8 jets in the various pr bins considered, for both 
signal and background. The relatively low degree of corre- 
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c [| =1 


(c) anti -kj R = 0.8, pj = 300-400 GeV bin 


Wg, pT = 1000-1100 GeV, AK8 BOOST13WG 



Fig. 19 The-4 -1 andcf -1 


distributions for signal and background R = 0.8 jets in two different pj bins. 


lation between m s( j versus C? that leads to these large im¬ 
provements in background rejection can be seen. What lit¬ 
tle correlation exists is rather non-linear in nature, changing 
from a negative to a positive correlation as a function of the 
groomed mass, something which helps to improve the back¬ 
ground rejection in the region of the W mass peak. 

However, when we switch to a jet radius of R = 1.2 the 
picture for C% _1 combinations changes dramatically. These 
become significantly less powerful, and the most powerful 
variable in groomed mass combinations becomes 1 for 
all jet pj considered. Figure 23 shows the correlation be¬ 
tween m^f~ 2 and Cj 1 in the pj = 1.0 - 1.1 TeV bin for the 
various jet radii considered. Figure 24 is the equivalent set of 
distributions for m^ d 2 and T d] ' • One can see from Figure 23 
that, due to the sensitivity of the observable to to soft, wide- 


angle radiation, as the jet radius increases increases 

and becomes more and more smeared out for both signal and 
background, leading to worse discrimination power. This 

does not happen to the same extent for rf, '. We can see 

B=2 

from Figure 24 that the negative correlation between m' d 
and rf, 1 that is clearly visible for R = 0.4 decreases for 
larger jet radius, such that the groomed mass and substruc¬ 
ture variable are far less correlated and rf, 1 offers improved 
discrimination within a m^ d 2 mass window. 

6.3.2 Mass + Mass Performance 

The different groomed masses and the ungroomed mass are 
of course not fully correlated, and thus one can always see 
some kind of improvement in the background rejection when 
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(a) anti-Aj- R = 0.4, pj = 1.0-1.1 TeV bin (b) antiAj- R = 1.2, pr = 1.0-1.1 TeV bin 


Wg, pT = 1000-1100 GeV, AK4 BOOST13WG 


Wg, pT = 1000-1100 GeV, AK12 BOOST13WG 
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(d) anti-kj R = 1.2, pj = 1.0-1.1 TeV bin 


Fig. 20 The Soft-drop j8 = 2 and pruned groomed mass distribution for signal and background R = 0.4 and R = 1.2 jets in the pj = 1.0-1.1 TeV 
bin. 


two different mass variables are combined in the BDT. How¬ 
ever, in some cases the improvement can be dramatic, partic¬ 
ularly at higher pj, and particularly for combinations with 
the ungroomed mass. For example, in Figure 17 we can see 
that in the pj =1.0-1.1 TeV bin, the combination of pruned 
mass with ungroomed mass produces a greater than eight¬ 
fold improvement in the background rejection for R = 0.4 
jets, a greater than five-fold improvement for R = 0.8 jets, 
and a factor ~ 2 improvement for R = 1.2 jets. A similar 
behaviour can be seen for mMDT mass. In Figures 25, 26 
and 27, we show the 2-D correlation plots of the pruned 
mass versus the ungroomed mass separately for the WW sig¬ 
nal and gg background samples in the pj = 1.0-1.1 TeV bin, 
for the various jet radii considered. For comparison, the cor¬ 
relation of the trimmed mass with the ungroomed mass, a 


combination that does not improve on the single mass as 
dramatically, is shown. In all cases one can see that there 
is a much smaller degree of correlation between the pruned 
mass and the ungroomed mass in the backgrounds sample 
than for the trimmed mass and the ungroomed mass. This is 
most obvious in Figure 25, where the high degree of correla¬ 
tion between the trimmed and ungroomed mass is expected, 
since with the parameters used (in particular /? tnm = 0.2) we 
cannot expect trimming to have a significant impact on an 
R = 0.4 jet. The reduced correlation with ungroomed mass 
for pruning in the background means that, once we have 
required that the pruned mass is consistent with a W (i.e. 
~ 80 GeV), a relatively large difference between signal and 
background in the ungroomed mass still remains, and can 
be exploited to improve the background rejection further. 
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Wg, pT = 1000-1100 GeV, AK8 BOOST13WG 




(b) anti -hr R= 1.2, pj = 1.0-1.1 TeV bin 


Wg, pT= 1000-1100 GeV, AK8 BOOST13WG Wg, pT = 1000-1100 GeV, AK12 BOOST13WG 



Fig. 21 The-tf, -1 andcf -1 


distributions for signal and background R = 0.8 and R = 1.2 jets in the pj = 1.0-1.1 TeV bin. 


In other words, many of the background events which pass 
the pruned mass requirement do so because they are shifted 
to lower mass (to be within a signal mass window) by the 
grooming, but these events still have the property that they 
look very much like background events before the groom¬ 
ing. A requirement on the groomed mass alone does not 
exploit this property. Of course, the impact of pile-up, not 
considered in this study, could limit the degree to which the 
ungroomed mass could be used to improve discrimination in 
this way. 

6.3.3 “All Variables” Performance 

Figures 15,16 and 17 report the background rejection achieved 
by a combination of all the variables considered into a single 
BDT discriminant. In all cases, the rejection power of this 


“all variables” BDT is significantly larger than the best two- 
variable combination. This indicates that, beyond the best 
two-variable combination, there is still significant comple¬ 
mentary information available in the remaining observables 
to improve the discrimination of signal and background. How 
much complementary information is available appears to be 
pj dependent. In the lower pj = 300-400 and 500-600 GeV 
bins, the background rejection of the “all variables” combi¬ 
nation is a factor ~ 1.5 greater than the best two-variable 
combination, but in the highest pj bin it is a factor ~ 2.5 
greater. 

The final column in Figures 15, 16 and 17 allows us to 
further explore the all variables performance relative to the 
pair-wise performance. It shows the background rejection 
for three-variable BDT combinations of “ f C? 1 +X, 
where X is the variable on the y-axis. For jets with R = 0.4 
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Fig. 22 2-D histograms of m^ d 2 versus Cj 1 distributions for R = 0.8 jets in the various pr bins considered, shown separately for signal and 
background. 
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Fig. 23 2-D histograms of nt, ~ versus cf 1 for R = 0.4, 0.8 and 1 .2 jets in the pj = 1.0-1.1 TeV bin, shown separately for signal and background. 
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(b) R = 0.8 












































30 



(b) Trimmed mass vs ungroomed mass 

Fig. 25 2-D histograms of groomed mass versus ungroomed mass in the pj = 1.0-1.1 TeV bin using the anti-Aj- R = 0.4 algorithm, shown 
separately for signal and background. 


and R = 0.8, the combination 2 + C, 1 is (at least close 
to) the best performant two-variable combination in every 
Pt bin considered. For R = 1.2 this is not the case, as C 2 1 
is superseded by 1 in performance, as discussed earlier. 
Thus, in considering the three-variable combination results, 
it is simplest to focus on the R = 0.4 and R = 0.8 cases. Here 
we see that, for the lower pj = 300-400 and 500-600 GeV 
bins, adding the third variable to the best two-variable com¬ 
bination brings us to within ~ 15% of the “all variables” 
background rejection. However, in the highest pj = 1.0- 
1.1 TeV bin, whilst adding the third variable does improve 
the performance considerably, we are still ~ 40% from the 
observed “all variables” background rejection, and clearly 
adding a fourth or maybe even fifth variable would bring 
considerable gains. In terms of which variable offers the best 
improvement when added to the + C? 1 combination, 


it is hard to see an obvious pattern; the best third variable 
changes depending on the pj and R considered. 

It appears that there is a rich and complex structure in 
terms of the degree to which the discriminatory information 
provided by the set of variables considered overlaps, with 
the degree of overlap apparently decreasing at higher pj. 
This suggests that in all pj ranges, but especially at higher 
pj, there are substantial performance gains to be made by 
designing a more complex multivariate W tagger. 

6.4 Conclusions 

We have studied the performance, in terms of the separation 
of a hadronically decaying W boson from a gluon-initiated 
jet background, of a number of groomed jet masses, sub¬ 
structure variables, and BDT combinations of the above. We 
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Fig. 26 2-D histograms of groomed mass versus ungroomed mass in the pj = 1.0-1.1 TeV bin using the anti-Aj- R = 0.8 algorithm, shown 
separately for signal and background. 


have used this to gain insight into how the discriminatory in¬ 
formation contained in the variables overlaps, and how this 
complementarity between the variables changes with jet p-/ 
and anti-Ay distance parameter R. 

In terms of the performance of individual variables, we 
find that, in agreement with other studies [40], the groomed 
masses generally perform best, with a background rejection 
power that increases with larger pj, but which is more con¬ 
sistent with respect to changes in R. We have explained the 
dependence of the groomed mass performance on pj and 
R using the understanding of the QCD mass distribution 
developed in Section 5.4. Conversely, the performance of 
other substructure variables, such as and T-f, 1 , is more 

susceptible to changes in radius, with background rejection 
power decreasing with increasing R. This is due to the in¬ 


herent sensitivity of these observables to soft, wide angle 
radiation. 

The best two-variable performance is obtained by com¬ 
bining a groomed mass with a substructure variable. Which 
particular substructure variable works best in combination 
strongly depends on p \ and R. The variable C'(' 1 offers sig¬ 
nificant complementarity to groomed mass for the smaller 
values of R investigated (R = 0.4 and 0.8), owing to the 
small degree of correlation between the variables. However, 
the sensitivity of cf _1 to soft, wide-angle radiation leads 
to worse discrimination power at R = 1.2, where 1 per¬ 
forms better in combination. The best two-variable perfor¬ 
mance in each p-/ bin examined is obtained for Cj 1 in 
combination with a groomed mass, using R = 0.8, with a 
performance that is better at higher p/. Our studies also 
demonstrate the potential for enhancing discrimination by 
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(b) Trimmed mass vs ungroomed mass 


Fig. 27 2-D histograms of groomed mass versus ungroomed mass in the pj = 1.0-1.1 TeV bin using the anti-ky R = 1.2 algorithm, shown 
separately for signal and background. 


combining groomed and ungroomed mass information, al¬ 
though the use of ungroomed mass in this may be limited in 
practice by the presence of pile-up that is not considered in 
these studies. 

By examining the performance of a BDT combination 
of all variables considered, it is clear that there are poten¬ 
tially substantial performance gains to be made by designing 
a more complex multivariate W tagger, especially at higher 
Pt- 

7 Top Tagging 

In this section, we investigate the identification of boosted 
top quarks using jet substructure. Boosted top quarks result 
in large-radius jets with complex substructure, containing a 
fc-subjet and a boosted W. As a consequence of the many 


kinematic differences between top and QCD jets, top tag¬ 
gers are typically complex, with a couple of input parame¬ 
ters necessary for any given algorithm. We study the varia¬ 
tion in performance of top tagging techniques with respect to 
jet pj and R, re-optimizing the tagger inputs for each kine¬ 
matic range and jet radius considered. We also investigate 
the effects of combining dedicated top tagging algorithms 
with other jet substructure variables, giving insight into the 
correlations among top-tagging variables. 

7.1 Methodology 

We use the top quark MC samples for each bin described 
in Section 2.2. The analysis relies on FastJet 3.0.3 for jet 
clustering and calculation of jet substructure variables. Jets 
are clustered using the anti-kj algorithm, and only the lead- 
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ing jet is used in each analysis. To ensure similar pj spectra 
in each bin an upper and lower pj cut are applied to each 
sample after jet clustering. The bins in leading jet p R for top 
tagging are 600-700 GeV, 1-1.1 TeV, and 1.5-1.6 TeV. Jets 
are clustered with radii R = 0.4, 0.8, and 1.2; R = 0.4 jets 
are only studied in the 1.5-1.6 TeV bin because the top de¬ 
cay products are all contained within an R = 0.4 jet for top 
quarks with this boost. 

We study a number of top-tagging strategies, which can 
be divided into two distinct categories. In the first category 
are dedicated top-tagging algorithms, which aim to directly 
reconstruct the top and W candidates in the top decay. In 
particular, we study: 

1. HEPTopTagger 

2. Johns Hopkins Tagger (JH) 

3. Trimming with VV'-idcntihcation 

4. Pruning with W -identification 

as described in Section 3.3. In the case of the HepTopTagger 
and JH tagger, the algorithms produce three output variables 
(m tl niw and helicity angle) that can be used to discriminate 
top jets from QCD. The trimming and pruning algorithms as 
used here produce two outputs, m t and niw. All of the above 
taggers and groomers incorporate a step to remove contri¬ 
butions from the underlying event and other soft radiation 
to the reconstructed m, and mw, and also explicitly rejects 
jets that do not meet basic selection criteria, as explained in 
detail in Section 3.3. 

In the second category are individual jet substructure 
variables that are sensitive to the radiation pattern within the 
jet, which we refer to as “jet-shape variables”. While the 
most sensitive top-tagging variables are typically sensitive 
to three-pronged radiation, we also consider variables sen¬ 
sitive to two-pronged radiation in the limit where the W is 
very boosted and its subjets overlap. The variables we con¬ 
sider are: 

- The ungroomed jet mass. 

- A-subjettiness ratios tj] 1 and T^ -1 , using the “winner- 

takes-all” axes definition. 

- 2 -point energy correlation function ratios 1 and 1 

- The pruned Qjet mass volatility, J~Qj e t- 

Several of these variables were also considered earlier for 
q/g-te gging and W-tagging. 

To study the correlations amongst the above substructure 
variables and tagging algorithms, we combine the relevant 
tagger output variables and/or jet shapes into a BDT, as de¬ 
scribed in Section 4. Additionally, because each tagger has 
two input parameters, we scan over reasonable values of the 
input parameters to determine the optimal value that gives 

4 Similar studies were recently performed for the HepTopTagger in [63, 
64], in the context of trying to improve the tagger by combining it’s 
outputs with /V-subjettiness. 


the largest background rejection for each top tagging signal 
efficiency. This allows a direct comparison of the optimized 
version of each tagger. The input parameter values scanned 
for the various algorithms are: 

- HEPTopTagger: m G [30,100] GeV, ,u G [0.5,1] 

- JH Tagger: 8 P G [0.02,0.15], S R G [0.07,0.2] 

- Trimming: / cut G [0.02.0.14], // tnm G [0.1,0.5] 

- Pruning: z cut G [0.02,0.14], R cat G [0.1,0.6] 

We also investigate the degradation in performance of the 
top-tagging variables when moving away from the optimal 
parameter choice. 

7.2 Single Variable Performance 

We begin by investigating the behaviour of individual jet 
substructure variables. Because of the rich, three-pronged 
structure of the top decay, it is expected that combinations 
of masses and jet shapes will far outperform single variables 
in identifying boosted tops. However, a study of the top¬ 
tagging performance of single variables facilitates a direct 
comparison with the W tagging results in Section 6 , and also 
allows a straightforward examination of the performance of 
each variable for different //-/■ and jet radius. 

Top-tagging performance is quantified using ROC curves. 
Figure 28 shows the ROC curves for each of the top-tagging 
variables, with the bare (ungroomed) jet mass also plotted 
for comparison. The jet-shape variables all perform substan¬ 
tially worse than ungroomed jet mass; this is in contrast with 
W tagging, for which several variables are competitive with 
or perform better than ungroomed jet mass (see, for exam¬ 
ple, Figures 16(a), 17(a) and 17(b)). To understand why this 
is the case, consider /V-subjettiness: the W is two-pronged 
and the top is three-pronged, and so we expect T 21 and T 32 
to be the best-performant /V-subjettiness ratios, respectively. 
However, a cut selection small values of T 21 necessarily se¬ 
lects for events with large Ti, which is strongly correlated 
with jet mass, up to exponentially suppressed contributions. 
Therefore, T 21 applied to VV’-tagging indirectly incorporates 
some information about the jet mass in addition to shape in¬ 
formation. By contrast, T 32 applied to top tagging does not 
include any information on the ungroomed jet mass infor¬ 
mation. This likely accounts for why, relative to a cut on 
ungroomed mass, T 32 for top tagging performs substantially 
worse than T 21 for W-tagging. 

Of the two top-tagging algorithms, it is apparent from 
Figure 28 that the Johns Hopkins tagger out-performs the 
HEPTopTagger in terms of its background rejection at fixed 
signal efficiency for both the top and W candidate masses; 
this is expected, as the HEPTopTagger was designed to re¬ 
construct moderate-/;-/- top jets in ttH events (for a proposed 
high -pr variant of the HEPTopTagger, see [65]). In Fig¬ 
ure 29, we show the histograms for the top mass output from 
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Fig. 28 Comparison of single-variable top-tagging performance in the pj = 1 — 1.1 GeV bin using the anti -kj, R=0.8 algorithm. 


the JH and HEPTopTagger for different R in the pj = 1.5-1.6 
TeV bin, and in Figure 30 for different pj at R = 0.8, opti¬ 
mized at a signal efficiency of 30%. A particular feature of 
the HepTopTagger algorithm is that, after the jet is filtered 
to select the five hardest subjets, the three subjets are chosen 
which most closely reconstruct the top mass. This require¬ 
ment tends to shape a peak in the QCD background around 
m, for the HEPTopTagger, as can be seen from Figures 29(d) 
and 30(d); this is the likely reason for the better performance 
of the JH tagger, which has no such requirement. This effect 
is more pronounced at higher pr and larger jet radius (see 
Figures 32 and 35). It has been proposed [63,64] that perfor¬ 
mance of the HEPTopTagger may be improved by changing 
the selection criteria and/or performing a multivariate anal¬ 
ysis with other variables. For example, the three subjets re¬ 
constructing the top should be selected only among those 
sets that pass the W mass constraints, which reduces the 
shaping of the background. We indeed confirm below that 
combining the HEPTopTagger with other variables reduces 
the discrepancy between the JH and the HEPTopTagger, and 
a preliminary study indicates that the new ordering prescrip¬ 
tions makes the tagger performances more comparable. 

We also see in Figure 28(b) that the top mass from the 
JH tagger and the HEPTopTagger has superior performance 
relative to either of the grooming algorithms; this is because 
the pruning and trimming algorithms do not have inherent 
\V -identification steps and are not optimized for this pur¬ 
pose. Indeed, because of the lack of a W-identification step, 
grooming algorithms are forced to strike a balance between 
under-grooming the jet, which broadens the signal peak due 
to underlying event contamination and features a larger back¬ 
ground rate, and over-grooming the jet, which occasionally 
throws out the fc-jet and preserves only the W components 
inside the jet. We demonstrate this effect in Figures 29 and 
30, showing that with 30% signal efficiency, the optimal per¬ 
formance of the tagger over-grooms a substantial fraction of 


the jets (~ 20 — 30%), leading to a spurious second peak at 
mw- This effect is more pronounced at large R and pj, since 
more aggressive grooming is required in these limits to com¬ 
bat the increased contamination from underlying event and 
QCD radiation. 

In Figures 31 and 32 we directly compare ROC curves 
for jet-shape variable performance and top-mass performance, 
respectively, in three different pj bins whilst keeping the 
jet radius fixed at R = 0.8. The input parameters of the tag¬ 
gers, groomers and shape variables are separately optimized 
in each pj bin. One can see from Figure 31 that the tag¬ 
ging performance of jet shapes do not change substantially 
with pj. The variables X 22 ' and i~Qj et have the most varia¬ 
tion and tend to degrade with higher pj, as can be seen in 
Figure 33. This was also observed in the VV-tagging studies 
in Section 6, and makes sense, as higher-/?-/ QCD jets have 
more, harder emissions within the jet, giving rise to sub¬ 
structure that fakes the signal. For the variable fQj et (again 
as discussed in Section 6) increasing pj leads to QCD jets 
with a narrower volatility distribution due to the enhanced 
contribution of the “shoulder” region, while for the signal 
(top) jets the increased amount of soft radiation with increas¬ 
ing pj results in a broader volatility distribution. This with 
increasing pj the signal and background jets exhibit more 
similar volatility distributions, as we see explicitly in Fig¬ 
ures 33 (a) and (b). Thus i~Qj e t becomes less discriminant 
for top identification as pj increases. By contrast, from Fig¬ 
ure 32 we can see that most of the top-mass variables have 
superior performance at higher //-/ , due to the radiation from 
the top quark becoming more collimated. The notable ex¬ 
ception is the HEPTopTagger, which degrades at higher 
likely in part due to the background-shaping effects studied 
above and which is at least partially mitigated by recent up¬ 
dates to the HEPTopTagger [63, 64]. 

In Figures 34 and 35 we directly compare ROC curves 
for jet-shape variable performance and top-mass performance. 
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Fig. 29 Comparison of top mass reconstruction with the Johns Hopkins (JH), HEPTopTaggers (HEP), pruning, and trimming at different R using 
the anti -kj algorithm in the pj = 1.5-1.6 TeV bin. Each histogram is shown for the working point optimized for best performance with m, in 
the 0.3-0.35 signal efficiency bin, and is normalized to the fraction of events passing the tagger. In this and subsequent plots, the HEPTopTagger 
distribution cuts off at 500 GeV because the tagger fails to tag jets with a larger mass. 
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Fig. 30 Comparison of top mass reconstruction with the Johns Hopkins (JH), HEPTopTaggers (HEP), pruning, and trimming at different pt using 
the anti-k^ algorithm, R = 0.8. Each histogram is shown for the working point optimized for best performance with m, in the 0.3-0.35 signal 
efficiency bin, and is normalized to the fraction of events passing the tagger. 
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Fig. 31 Comparison of individual jet shape performance at different pj using the anti-f'j- R = 0.8 algorithm. 


respectively, for three different jet radii within the p j = 1.5- 
1.6 TeV bin. Again, the input parameters of the taggers, 
groomers and shape variables are separately optimized for 
each jet radius. We can see from these figures that most of 
the top-tagging variables, both shape and reconstructed top 
mass, perform best for smaller radius, as was generally ob¬ 
served in the case of W-tagging in Section 6. This is likely 
because, at such high pr, most of the radiation from the top 


quark is confined within R = 0.4, and having a larger jet ra¬ 
dius makes the variable more susceptible to contamination 
from the underlying event and other uncorrelated radiation. 
In Figure 36, we compare the individual top signal and QCD 
background distributions for each shape variable considered 
in the pj = 1.5-1.6 TeV bin for the various jet radii. In Fig¬ 
ures 36 (a) to (h) the distributions for both signal and back¬ 
ground broaden with increasing R , degrading the discrim- 
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Fig. 32 Comparison of top mass performance of different taggers at different pr using the anti-^j- R=0.8 algorithm. 



(a) iQjet, Pt = 600-700 GeV (b) iQjet, Pt = 1.5-1.6 TeV (c) x 22 pj = 600-700 GeV 


BOOST13WG 



Fig. 33 Comparison of lQj et and z 22 * 


at R = 0.8 and different values of the pj. These shape variables are the most sensitive to varying pj. 


inating power. For Cj 1 and C' 2 1 , the background distri¬ 
butions are shifted to larger values as well. For the variable 
lQjet> as already discussed for increasing pr (and in Sec¬ 
tion 6) the behavior with increasing R is a bit more com¬ 
plicated, with the QCD jets becoming less volatile and the 
signal jets more volatile, i.e., the two volatility distributions 
become more similar as we move from Figure 36 (i) to Fig¬ 
ure 36 (j). So again the discriminating power decreases with 
increasing R. The main exception is for C 2 1 , which per¬ 


forms optimally at R = 0.8; in this case, the signal and back¬ 
ground coincidentally happen to have the same distribution 
around R = 0.4, and so R = 0.8 gives better discrimination. 


7.3 Performance of Multivariable Combinations 

We now consider various BDT combinations of the single 
variables considered in the last section, using the techniques 
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Fig. 34 Comparison of individual jet shape performance at different R in the pj = 1.5-1.6 TeV bin. 


described in Section 4. In particular, we consider the per¬ 
formance of individual taggers such as the IH tagger and 
HEPTopTagger, which output information about the top and 
W candidate masses and the helicity angle; for each tagger, 
all three output variables are combined in a BDT. For trim¬ 
ming and pruning, the output candidate m w and m, are com¬ 
bined in a BDT. Finally, we consider the combination of the 
full set of outputs of each of the above taggers/groomers 


with the shape variables, as well also a combination of the 
outputs of the HEPTopTagger and IH tagger. This allows 
us to determine the degree of complementary information 
in taggers/groomers and shape variables, as well as between 
the top tagging algorithms themselves. For all variables with 
tuneable input parameters, we scan and optimize over real¬ 
istic values of such parameters, as described in Section 7.1. 
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Fig. 35 Comparison of top mass performance of different taggers at different R in the pr = 1.5-1. 6 TeV bin. 


In Figure 37, we directly compare the performance of 
the HEPTopTagger, the JH tagger, trimming, and pruning, in 
the pr = 1 — 1.1 TeV bin with R = 0.8, where both m, and 
niw are used in the groomers. Generally, we find that prun¬ 
ing, which does not naturally incorporate subjets into the 
algorithm, does not perform as well as the others. Interest¬ 
ingly, trimming, which does include a subjet-identification 
step, performs comparably to the standard HEPTopTagger 
over much of the range, possibly due to the background¬ 
shaping observed in Section 7.2, although this can change 
with recent proposed updates to the HEPTopTagger [63, 64]. 
By contrast, the IH tagger outperforms the other standard al¬ 
gorithms. To determine whether there is complementary in¬ 
formation in the mass outputs from different top taggers, we 
also consider in Figure 37 a multivariable combination of 
all of the IH and HEPTopTagger outputs. The maximum ef¬ 
ficiency of the combined IH and HEPTopTaggers is limited, 
as some fraction of signal events inevitably fails either one 
or other of the taggers. We do see a 20-50% improvement 
in performance when combining all outputs, which suggests 
that the different algorithms used to identify the top and W 
for different taggers contains complementary information. 


In Figure 38 we present the results for multivariable com¬ 
binations of the top tagger outputs with and without shape 
variables. We see that, for both the HEPTopTagger and the 
IH tagger, the shape variables contain additional informa¬ 
tion uncorrelated with the masses and helicity angle, and 
give on average a factor 2-3 improvement in signal discrimi¬ 
nation. We see that, when combined with the tagger outputs, 
both the energy correlation functions C 2 + C 3 and the N- 
subjettiness ratios T 21 + T 32 give comparable performance, 
while I~Qjet is slightly worse; this is unsurprising, as Qjets ac¬ 
cesses shape information in a more indirect way from other 
shape variables. Combining all shape variables with a sin¬ 
gle top tagger provides even greater enhancement in dis¬ 
crimination power. We directly compare the performance 
of the IH and HEPTopTaggers in Figure 38(c). Combining 
the taggers with shape information nearly erases the differ¬ 
ence between the tagging methods observed in Figure 37; 
this indicates that combining the shape information with the 
HEPTopTagger identifies the differences between signal and 
background missed by the standard tagger alone. This also 
suggests that further improvement to discriminating power 
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Fig. 36 Comparison of various shape variables in the pr = 1.5- 1.6 TeV bin and different values of the anti-kp radius R. 


may be minimal, as various multivariable combinations con¬ 
verge to within a factor of 20 % or so. 

In Figure 39 we present the results for multivariable com¬ 
binations of groomer outputs with and without shape vari- 


Finally, in Figure 40, we compare the performance of 
each of the tagger/groomers when their outputs are com¬ 
bined with all of the shape variables considered. One can see 
that the discrepancies between the performance of the differ¬ 
ent taggers/groomers all but vanishes, suggesting perhaps 
ables. As with the tagging algorithms, combinations of groomers that we are here utilising all available signal-background 

discrimination information, and that this is the optimal top 
tagging performance that could be achieved in these condi¬ 
tions. 


with shape variables improves their discriminating power; 
combinations with T32 + T21 perform comparably to those 
with C 3 + C 2 , and both of these are superior to combina¬ 
tions with the mass volatility, fQj e t- Substantial further im¬ 
provement is possible by combining the groomers with all 
shape variables. Not surprisingly, the taggers that lag behind 
in performance enjoy the largest gain in signal-background 
discrimination with the addition of shape variables. Once 
again, in Figure 39(c), we find that the differences between 
pruning and trimming are erased when combined with shape 
information. 


Up to this point, we have considered only the combined 
multivariable performance in the pj = 1.0-1.1 TeV bin with 
jet radius R = 0.8. We now compare the BDT combinations 
of tagger outputs, with and without shape variables, at dif¬ 
ferent pj. The taggers are optimized over all input param¬ 
eters for each choice of pj and signal efficiency. As with 
the single-variable study, we consider anti-A:-/- jets clustered 
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Fig. 37 The performance of the various taggers in the pj = 1 — 1.1 TeV bin using the anti-Fr R=0.8 algorithm. For the groomers a BDT com¬ 
bination of the reconstructed m, and mw are used. Also shown is a multivariable combination of all of the JH and HEPTopTagger outputs. The 
ungroomed mass performance is shown for comparison. 
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Fig. 38 The performance of BDT combinations of the JH and HepTopTagger outputs with various shape variables in the pr = 1 — 1.1 TeV bin 
using the anti-^j- R = 0.8 algorithm. Taggers are combined with the following shape variables: 1 +Cj -1 , lQj e t> and a ll of the 

above (denoted “shape”). 
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Fig. 39 The performance of the BDT combinations of the trimming and pruning outputs with various shape variables in the py = 1 — 1.1 TeV bin 
using the anti-Aj- R = 0.8 algorithm. Groomer mass outputs are combined with the following shape variables: + T^ -1 , Cj _1 +C3 -1 , lQj et , 

and all of the above (denoted “shape”). 
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Fig. 40 Comparison of the performance of the BDT combinations of all the groomer/tagger outputs with all the available shape variables in 
the pt = 1 —1.1 TeV bin using the anti-Ay R=0.8 algorithm. Tagger/groomer outputs are combined with all of the following shape variables: 

/ = ! / =1 pP = i —I— pP = l p 

l 21 ' ‘‘il ' l 2 ^’-'3 > ^Qjet- 























43 


with R = 0.8 and compare the outcomes in the py = 500-600 
GeV, pr = 1-1.1 TeV, and py = 1.5-1.6 TeV bins. The com¬ 
parison of the taggers/groomers is shown in Figure 41. The 
behaviour with pj is qualitatively similar to the behaviour 
of the m t variable for each tagger/groomer shown in Fig¬ 
ure 32; this suggests that the pr behaviour of the taggers 
is dominated by the top-mass reconstruction. As before, the 
standard HEPTopTagger performance degrades slightly with 
increased py- due to the background shaping effect (which 
may be mitigated by recently proposed updates), while the 
JH tagger and groomers modestly improve in performance. 

In Figure 42, we show the p-/ -dependence of BDT com¬ 
binations of the JH tagger output combined with shape vari¬ 
ables. In terms of pr dependence, we find that the curves 
look nearly identical to Figure 41(b): the pj dependence is 
again dominated by the top-mass reconstruction, and com¬ 
bining the tagger outputs with different shape variables does 
not substantially change this behavior. Although not shown 
here, the same behavior is observed for trimming and prun¬ 
ing. By contrast, the py dependence of the HEPTopTagger 
ROC curves, shown in Figure 43, does change somewhat 
when combined with different shape variables; due to the 
suboptimal performance of the HEPTopTagger at high py in 
the conventional configuration, we find that combining the 
HEPTopTagger with C^ 1 , which in Figure 31(b) is seen to 
have some modest improvement at high pj, can improve its 
performance. Combining the standard HEPTopTagger with 
multiple shape variables gives the maximum improvement 
in performance at high pr relative to at low py . 

In Figure 44 we compare the BDT combinations of tag¬ 
ger outputs, with and without shape variables, at different 
jet radius R in the py = 1.5-1.6 TeV bin. The taggers are op¬ 
timized over all input parameters for each choice of R and 
signal efficiency. We find that, for all taggers and groomers, 
the performance is always best at small R\ the choice of 
R is sufficiently large to admit the full top quark decay at 
such high pj, but is small enough to suppress contamination 
from additional radiation. This is not altered when the tag¬ 
gers are combined with shape variables. For example, in Fig¬ 
ure 45 is shown the dependence on R of the JH tagger when 
combined with shape variables, where one can see that the 
^-dependence is identical for all combinations. The same 
holds true for the HEPTopTagger, trimming, and pruning. 


7.4 Performance at Sub-Optimal Working Points 

Up until now, we have re-optimized our tagger and groomer 
parameters for each py , R, and signal efficiency working 
point. In reality, experiments will choose a finite set of work¬ 
ing points to use. When this is taken into account, how will 
the top-tagging performance compare to the optimal results 


already shown? To address this concern, we replicate our 
analyses, but optimize the top taggers only for a single py 
bin, single jet radius R, or single signal efficiency, and subse¬ 
quently apply the same parameters to other scenarios. This 
allows us to determine the extent to which re-optimization 
is necessary to maintain the high signal-to-background dis¬ 
crimination power seen in the top-tagging algorithms we 
studied. In this section, we focus on the taggers and groomers, 
and their combination with shape variables, as the shape 
variables alone typically do not have any input parameters 
to optimize. 

Optimizing at a single pr‘- We show in Figure 46 the per¬ 
formance of the reconstructed top mass for the pj = 0.6-0.7 
TeV and py = 1.0-1.1 TeV bins, with all input parameters 
optimized to the py = 1.5-1.6 TeV bin (and R = 0.8 through¬ 
out). This is normalized to the performance using the opti¬ 
mized tagger inputs at each py . The performance degrada¬ 
tion is at the level of 20-30% (at maximum 50%) when the 
high-py optimized inputs are used at other momenta, with 
trimming and the Johns Hopkins tagger degrading the most. 
The jagged behaviour of the points is due to the finite res¬ 
olution of the scan. We also observe a particular effect as¬ 
sociated with using suboptimal taggers: since taggers some¬ 
times fail to return a top candidate, parameters optimized 
for a particular signal efficiency e s ; g at py = 1.5-1.6 TeV 
may not return enough signal candidates to reach the same 
efficiency at a different py . Consequently, no point appears 
for that py value. This is not often a practical concern, as the 
largest gains in signal discrimination and significance are for 
smaller values of e s i g , but it may be an important effect to 
consider when selecting benchmark tagger parameters and 
signal efficiencies. 

The degradation in performance is more pronounced for 
the BDT combinations of the full tagger outputs, shown in 
Figure 47. This is true particularly at very low signal effi¬ 
ciency, where the optimization of inputs picks out a cut on 
the tail of some distribution that depends precisely on the 
pr/R of the jet. Once again, trimming and the Johns Hop¬ 
kins tagger degrade more markedly. Similar behavior holds 
for the BDT combinations of tagger outputs plus all shape 
variables. 

Optimizing at a single R: In Figure 48, we show the per¬ 
formance of the reconstructed top mass for R = 0.4 and 0.8, 
with all input parameters optimized to R = 1.2 TeV bin (and 
pr = 1.5-1.6 TeV throughout). This is normalized to the per¬ 
formance using the optimized tagger inputs at each R. While 
the performance of each variable degrades at small £ s j g com¬ 
pared to the optimized search, the HEPTopTagger fares the 
worst. It is not surprising that a tagger whose top mass re¬ 
construction is susceptible to background-shaping at large 
R and py would require a more careful optimization of pa¬ 
rameters to obtain the best performance; recent updates to 
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Fig. 41 Comparison at different pj of the performance of various top tagging/grooming algorithms using the anti-&7- R = 0.8 algorithm. For each 
tagger/groomer, all output variables are combined in a BDT. 


the tagger algorithm [63, 64] may mitigate the need for this 
more careful optimization. 

The same holds true for the BDT combinations of the 
full tagger outputs, shown in Figure 49. The performance 
for the sub-optimal taggers is still within an (9(1) factor 
of the optimized performance, and the HEPTopTagger per¬ 
forms better with the combination of all of its outputs rel¬ 
ative to the performance with just m t . The same behaviour 
holds for the BDT combinations of tagger outputs and shape 
variables. 

Optimizing at a single efficiency: The strongest assump¬ 
tion we have made so far is that the taggers can be re-optimized 
for each signal efficiency point. This is useful for making a 
direct comparison of the power of different top-tagging al¬ 
gorithms, but is not particularly practical for LHC analyses. 
We now consider the scenario in which the tagger inputs are 
optimized once, in the e s ; g = 0.3-0.35 bin, and then used for 
all signal efficiencies. We do this in the pj = 1.0-1.1 TeV bin 
and with R = 0.8. 

The performance of each tagger, normalized to its per¬ 
formance optimized in each signal efficiency bin, is shown 


in Figure 50 for cuts on the top mass and W mass, and in Fig¬ 
ure 51 for BDT combinations of tagger outputs and shape 
variables. In both plots, it is apparent that optimizing the 
taggers in the £ s ; g = 0.3-0.35 efficiency bin gives compara¬ 
ble performance over efficiencies ranging from 0.2-0.5, al¬ 
though performance degrades at substantially different sig¬ 
nal efficiencies. Pruning appears to give especially robust 
signal-background discrimination without re-optimization, 
most likely due to the fact that there are no absolute dis¬ 
tance or pj scales that appear in the algorithm. Figures 50 
and 51 suggest that, while optimization at all signal efficien¬ 
cies is a useful tool for comparing different algorithms, it 
is not crucial to achieve good top-tagging performance in 
experiments. 

7.5 Conclusions 

We have studied the performance of various jet substructure 
variables, groomed masses, and top taggers to study the per¬ 
formance of top tagging with different pj and jet radius pa¬ 
rameters. At each pj, R , and signal efficiency working point. 
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Fig. 42 Comparison at different pj of the performance of the JH tagger using the anti-Aj- R = 0.8 algorithm, where all tagger output variables are 
combined in a BDT with various shape variables. 


we optimize the parameters for those variables with tune¬ 
able inputs. Overall, we have found that these techniques, 
individually and in combination, continue to perform well 
at high pr , at least at the particle-level, which is important 
for future LHC running. In general, the John Hopkins tagger 
performs best, while jet grooming algorithms under-perform 
relative to the best top taggers due to the lack of an opti¬ 
mized VV'-idcntification step. Tagger performance can be im¬ 
proved by a further factor of 2-4 through combination with 
jet substructure variables such as T 32 , C 3 , and /Qj et . When 
combined with jet substructure variables, the performance 
of various groomers and taggers becomes very comparable, 
suggesting that, taken together, the variables studied are sen¬ 
sitive to nearly all of the physical differences between top 
and QCD jets at particle-level. A small improvement is also 
found by combining the Johns Hopkins and HEPTopTag- 
gers, indicating that different taggers are not fully correlated. 
The degree to which these findings continue to hold under 
more realistic pile-up and detector configurations is, how¬ 
ever, not addressed in this analysis and left to future study. 


Comparing results at different p j and R, top-tagging per¬ 
formance is generally better at smaller R due to less contami¬ 
nation from uncorrelated radiation. Similarly, most variables 
perform better at larger //-/ due to the higher degree of col- 
limation of radiation. Some variables fare worse at higher 
pj, such as the /V-subjettiness ratio T 32 and the Qjet mass 
volatility /"Qj et , as higher-/; / QCD jets have more and harder 
emissions that fake the top-jet substructure. The standard 
HEPTopTagger algorithm is also worse at high pj due to the 
tendency of the tagger to shape backgrounds around the top 
mass. This is unsurprising, given that the HepTopTagger was 
specifically designed for a lower pj range than that consid¬ 
ered here; recently proposed updates may improve perfor¬ 
mance at high pj and R [63, 64]. The pj- and //-dependence 
of the multivariable combinations is dominated by the pj- 
and //-dependence of the top mass reconstruction compo¬ 
nent of the tagger/groomer. 

Finally, we consider the performance of various tagger 
and jet substructure variable combinations under the more 
realistic assumption that the input parameters are only op¬ 
timized at a single pj, R, or signal efficiency, and then the 
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Fig. 43 Comparison at different pr of the performance of the HEPTopTagger using the anti-f'j- R = 0.8 algorithm, where all tagger output variables 
are combined in a BDT with various shape variables. 


same inputs are used at other working points. Remarkably, 
the performance of all variables is typically within a factor 
of 2 of the fully optimized inputs, suggesting that while op¬ 
timization can lead to substantial gains in performance, the 
general behavior found in the fully optimized analyses ex¬ 
tends to more general applications of each variable. In par¬ 
ticular, the performance of pruning typically varies the least 
when comparing sub-optimal working points to the fully op¬ 
timized tagger due to the scale-invariant nature of the prun¬ 
ing algorithm. 

8 Summary & Conclusions 

Furthering our understanding of jet substructure is crucial 
to enhancing the prospects for the discovery of new physi¬ 
cal processes at Run II of the LHC. In this report we have 
studied the performance of jet substructure techniques over 
a wide range of kinematic regimes that will be encountered 
in Run II of the LHC. The performance of observables and 
their correlations have been studied by combining the vari¬ 
ables into Boosted Decision Tree (BDT) discriminants, and 


comparing the background rejection power of this discrimi¬ 
nant to the rejection power achieved by the individual vari¬ 
ables. The performance of “all variables” BDT discrimi¬ 
nants has also been investigated, to understand the potential 
of the “ultimate” tagger where “all” available particle-level 
information (at least, all of that provided by the variables 
considered) is used. 

We focused on the discrimination of quark jets from gluon 
jets, and the discrimination of boosted W bosons and top 
quarks from the QCD backgrounds. For each, we have iden¬ 
tified the best-performing jet substructure observables at par¬ 
ticle level, both individually and in combination with other 
observables. In doing so, we have also provided a physical 
picture of why certain sets of observables are (un)correlated. 
Additionally, we have investigated how the performance of 
jet substructure observables varies with R and pj, identi¬ 
fying observables that are particularly robust against or sus¬ 
ceptible to these changes. In the case of q/g tagging, it seems 
that the ideal performance can be nearly achieved by com¬ 
bining the most powerful discriminant, the number of con¬ 
stituents of a jet, with just one other variable, cf _1 (or tf _1 ). 
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Fig. 44 Comparison at different radii of the performance of various top tagging/grooming algorithms with pj = 1.5-1.6 TeV. For each tag¬ 
ger/groomer, all output variables are combined in a BDT. 


Many of the other variables considered are highly corre¬ 
lated and provide little additional discrimination. For both 
top and W tagging, the groomed mass is a very important 
discriminating variable, but one that can be substantially im¬ 
proved in combination with other variables. There is clearly 
a rich and complex relationship between the variables con¬ 
sidered for W and top tagging, and the performance and 
correlations between these variables can change consider¬ 
ably with changing jet pj and R. In the case of IV tagging, 
even after combining groomed mass with two other sub¬ 
structure observables, we are still some way short of the 
ultimate tagger performance, indicating the complexity of 
the information available, and the complementarity between 
the observables considered. In the case of top tagging, we 
have shown that the performance of both the John Hop¬ 
kins and HEPTopTagger can be improved when their out¬ 
puts are combined with substructure observables such as T 32 
and C 3 , and that the performance of a discriminant built 
from groomed mass information plus substructure observ¬ 
ables is very comparable to the performance of the taggers. 
We have optimized the top taggers for particular values of 
pr , R. and signal efficiency, and studied their performance 


at other working points. We have found that the performance 
of observables remains within at most a factor of two of 
the optimized value, suggesting that the performance of jet 
substructure observables is not significantly degraded when 
tagger parameters are only optimized for a few select bench¬ 
mark points. 

In all of q/g, W and top tagging, we have observed that 
the tagging performance improves with increasing pj. How¬ 
ever, whereas for q/g and top tagging the performance im¬ 
proves with decreasing R (for the range of R considered 
here), the dependence on R for W tagging is more complex, 
with a peak performance at R = 0.8 for each pr bin consid¬ 
ered. 

Our analyses were performed with ideal detector and 
pile-up conditions in order to most clearly elucidate the un¬ 
derlying physical scaling with pr and R. At higher boosts, 
detector resolution effects will become more important, and 
with the higher pile-up expected at Run II of the LHC, pile- 
up mitigation will be crucial for future jet substructure stud¬ 
ies. Future studies will be needed to determine which of the 
observables we have studied are most robust against pile-up 
and detector effects, and our analyses suggest particularly 
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Fig. 45 Comparison at different radii of the performance of the JH tagger in the pj = 1.5-1.6 TeV bin, where all tagger output variables are 
combined in a BDT with various shape variables 


useful combinations of observables to consider in such stud- particular, we thank the winner of the competition, Ms. Hal- 
ies. lie Bolonkin, for creating the final design. 


At the new energy frontier of Run II of the LHC, boosted 
jet substructure techniques will be more central to our searches 
for new physics than ever before. By achieving a deeper un¬ 
derstanding of the underlying structure of quark, gluon, W 
and top-initiated jets, as well as the relations between ob¬ 
servables sensitive to their respective structures, it is hoped 
that more sophisticated analyses can be performed that will 
maximally extend the reach for new physics. 
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